m 



MAC TR-149 



A PORTABLE COMPILER FOR THE LANGUAGE C 

Alan Snyder 



May 1975 



MASSACHUSETTS INSTITUTE OF TECHNOLOGY 

PROJECT MAC 

CAMBRIDGE MASSACHUSETTS 02139 



-2- 



A PORTABLE COMPILER FOB THE LANGUAGE 

by 

Alan Siqretor 



ABSTRACT 

This paper describes fhe implementation of a compiler for the programming language C The compiler has 
been designed to be capable of producing assemWy-language cede for meet register-oriented machines 
with only minor receding. Most of the macWne-dapendent information wed In code generation Is 
contained in a set of tables which are constructed erfonwiticiily from a mat hi i w d o eu l o ti o n provided by 
the implementer. In the machine description, the impiementer modett the target machine % defining a 
machine-dependent abstract machine for which the coda generator produces in te r med lot o code. The 
abstract machine is abstract m that it is a C machine: its registers and memory are defined in terms Of 
primitive C data types and its instructions perform bask C operations. The abstract machine is machine- 
dependent in that there is a; close correspondence between the registers Of the abstract machine end 
those of the target machine, and between' the behavior of the abstract machine instructions end the 
corresponding target machine instructions or instruction sequences. The I mplem e nte r defines the 
translation from an abstract machine program to a target machine program by providing in the mecNne 
description a set of simple -macro definitions for the abstract inecbine instructions. In addition, macro 
definitions may be provided # the term of C routines where additional processing cepebfltty It n ee ded. 
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1. Introduction 

This paper describes the implementation of a compiler for the programming language C [1,2], an 
implementation language developed at Bell Laboratories and a descendant of the language BCPL [3]. The 
compiler has been designed to be capable of producing assembly-language code for most register- 
oriented machines with only minor recoding. Versions of the compiler exist for the Honeywell HIS-6000 
and Digital Equipment Corporation POP-10 computers. 

C is a procedure-oriented language. It has four primitive data types (integers, characters, and single - 
and double-precision floating-point), four data type constructors (pointers, arrays, functions, and records), 
and a small but convenient set. of control structures which encourage goto-less programming. An 
important characteristic of C is the minimal run-time support needed. Although C supports recursive 
procedures, C does not have built-in functions, I/O statements, block structure, string operations, dynamic 
arrays, dynamic storage allocation, or run-time type checking. The only run-time data structure is the 
stack of procedure activation records. Of course, to run any useful programs, an interface to the 
operating system is required, and a standard set of I/O routines has been defined m order to encourage 
portability. But the implementation of these routines is optional and separate from the task of 
implementing a C compiler which produces code for a given machine. 

The compiler described in this paper was designed to be portable, that is, to be capable of generating 
code for many target machines with a minimum of recoding. When considering portability, three classes of 
machines can be defined: 

1. Machines which can support C programs reasonably efficiently: This class of machines depends only 
upon one's interpretation of the term "reasonably efficiently." Clearly, all real machines can run C 
programs, limited only by some size constraint related to the availability of memory. However, the 
following capabilities are desirable: (1) the ability to access the current procedure activation record 
and the current argument list in a reentrant manner - this will require one or two base/index 
registers depending upon the calling sequence, (2) the ability to reference via a pointer variable - 
this will require another base/index register or an indirection facility, (3) character addressing, (4) 
integer arithmetic, and (5) floating-point arithmetic. Not all of the above capabilities need be present 
in the target machine; however, the more that are missing, the more interpretive becomes the 
execution of a C program. For example, the HIS-6000 is word-addressed; thus references to 
character variables are interpreted by a small run-time subroutine. 

2. Machines for which the compiler can produce reasonably efficient code: This class of machines is 
clearly a subset of the first class; the size of the subset is again determined by one's definition of 
reasonable. The better the correspondence between the target machine and the machine model 
implicit in the compiler, the better will be the object code produced. On the Other hand, if the 
correspondence is poor, the compiler may be able to produce only threaded code or instructions to 
be interpreted by software. 

3. Machines which can support the compiler itself: Because the compiler is written in C, one may think 
that this class of machines is identical to the second class of machines; however, there are added 
restrictions which must be made in order to run the compiler on a given machine: the word size of 
the machine must be sufficient to hold all values used by the compiler; any implementation restriction 
on the size of procedures or data areas (as would be likely on the IBM S/360 because of addressing 
deficiencies) must not be such as to prohibit the proper execution of the compiler (this includes the 
ability of the compiler to compile itself). In addition, there are operating system and configuration 
restrictions: the memory size available to a program must be sufficient to hold the phases of the 
compiler; file space for the source of the compiler must be available and affordable; the I/O routines 
used by the compiler must be implemented. This class of machines is not a subset of the second class 
of machines since the compiler does not use all of the features of the language, notably floating-point. 

This paper concentrates on the second class of machines, those for which the compiler can produce 



reasonably ef ficient cods, given the restrictions of the first class of machine*, thosa which can support C 
programs reaeonebiy efficiently. Thus, throughout tMs paper, H» tan» "machai* iiwteuanaains" will 
gonoratty refer to the abittty of • i s m pi lw to c rc i uM «»o» far wary mat l ifa—. 

1.1 Motivation 

One of the serious problems in the field of software engineering is the difficulty of transferring programs 
to new machines, This is caused ** •*!• eert by the pfottfer atten «*' different programming languages 
and machines and the significant effort required to im p l s m a nt * com p rtor for o»y parftcator progra i m m ng 
language and target machine. On* approach to oatvmg j»j» prohtam to to restrict pi p a y amming language* 
to a few standardized la ngua g es which mm then i m pls m e nt ed on all target m a ch i nes sit in t ers** ; A 
disadvantage of this approach is that it eonfftete with toe eesirability *e* Iwvmg m*»y spec*eM»d 
languages for specialized problems. Another disadvantage it 4he laet that continual progress is being 
made in the development of progra mm i n g languages so that by the time r lmgmgs ie stenderdmed and 
widely available, it is already "obsolete." » is ajss cWwult to achieve eo mp aUWity among the various 
implementations of a standardized language. Evan if the standard language m watt defined, tt is difficult 
for compiler writers to restrain trmmaoives from attending it and for iisers to restrain rhawsehmi from 
using the language extensions, A similar approach to tha problem of program transferability ia to restrict 
the number of targe* machine* tor which compil e! t must be wrdtan i^ requiring that each new mach i ne 
be compatible with a widaiy-osed owsting machine. The oWBng of progress in lomyutsr architecture 
which would result from this requirement is as undesirable as the stiffing of progress fet p rpn/ei w mhm, 
languages which would result from adoption ef the previous approach. In addition, if tha rmw machines 
are only upward compatibie with the otd mactmrns, men probfomi may at** remain with retard to 
transferring programs from new machines to old ones. 

An alternative approach to those of language restriction and machine compatibility » to develop 
techniques that reduce the effort required to write compilers far various combmatierta of language! end 
machines. These techniques may be directed at two «sbprobiams > that of reducint the effort involved in 
writing one particular compiler and that of reducing ih* effort involved m writing a family ef related 
compilers. The dovotofMont of such techniques could have baneW* in addition to i a e w e ving program 
transferability, such as meting it easier to i mpl a m a nt a new language or meking iarejuages mere wtdoty 
available. 

An early effort in this direction was an attempt to devise a universal computer-oriented language UNCOL 
[4], which is both language^ndependent and macMne?indspenoent, to which ail fwogrsmming languages 
could bo translated and which iteatf could be translated with acceptable efficiency into any machine 
language. The idea was that one need write only one UNCOL-to-mechine language translator for oaoh 
target machine and one source langoaga-to-UNCa translator tor each source language, rather than 
having to write one compiler for each seuree language-machine hmguage combination. In addition, if 
UNCOL were well defined, then the various implementations of UNCOL could be mad* comparibte, thereby 
insuring the compatibility of tha source language implementations. Unfortunately, the concept of e 
universal language has net led to a practice* soMten of the problem; the characteristics or source and 
machine language in depen d en ce are incompatible with tha need for acceptably efficient translation from 
UNCOL to machine language. 

More practical techniques for reducing the effort involved in writing compters result if one considers 
techniques with more limited goats than these of the UNCOL project. One approach is to develop 
techniques which reduce the effort involved in writing one particular compiler for some tang«Mgs~machine 
combinatioa Examples of such technique^ are parser generator* and syntax-directed symbol processors 
[5} Another approach is to develop techniques for writing fammee of compeers for many source 
languages and one target machine. An example of such a technique ma compiter writing system with 
code generation primitives, such as FSL [*J The third approach, and the one which is taken in this worm, 
is that of the portable compiler, a compiler far a particular source language which can produce code for 
many target machines. It should be noted thai technique* such as parser gerwratcr », wWch can aid in the 
implementation of a single compiler, can be equally useful in the implementation of more general systems 
such es compiler Writing systems and portable compiler*. 



1.2 Background 

A compiler can be considered to consist of two logical phases, analysis and generation. The analysis 
phase performs lexical and syntactic analysis of the source program, producing as output some convenient 
internal representation of the program, along with a set of tables containing lexical information and other 
information derived from the declarative statements of the program. The generation phase then 
transforms the internal representation into an object language program, using the information contained in 
the tables produced by the analysis phase. One can confine the machine (object language) dependencies 
of a compiler to the generation phase by a suitable choice of internal representation, Le. one which is 
machine-independent. On the other hand, it is not practical to also confine the source language 
dependencies of a compiler to the analysis phase since this would maKe the internal representation a 
universal language. Thus the generation phase of a compiler is both source-language-dependent and 
machine-dependent. 

Most portable compilers require that the generation phase be completely rewritten for each target 
machine [7,8]. This effort may represent only about one-fifth of the effort needed to rewrite the entire 
compiler [8J. In the case of the BCPL compiler [9\ for example, moving the compiler may require only 
three to four weeks under ideal conditions (but otherwise may require up to five months). However, it 
would be desirable if the amount of recoding necessary to generate code for a new machine could be 
reduced. 

One approach is that advocated by Poole and Waite for writing portable programs [10,11]. They 
advocate that before writing a program to solve a particular problem, one define an abstract machine for 
which the program is then written. With this approach, in order to move the program to a new machine, 
one need only implement the abstract machine on the target machine, typically via a macro processor. 
The desired qualities of the abstract machine are that it contain operations and data objects convenient 
for expressing the problem solution, that it be sufficiently close to the target machines of interest so that 
acceptable code can easily be generated, and that the tools for implementing the abstract machine be 
easily obtainable on the target machines. 

This technique can be applied to portable compilers by considering the problem to be the implementation 
of an arbitrary source language program. The operations and data objects convenient for expressing the 
problem solution are then those which are basic, to the source language. With this technique, a compiler 
would be broken into two parts: a machine-independent translator from the source language to the 
abstract machine language and a machine-dependent translator from the abstract machine language to the 
target machine language. The translator from the abstract machine language to the target machine 
language should be smaller and simpler than the conventional generation phase would be; typically, it 
consists of a set of macro definitions which map each abstract machine instruction into the corresponding 
target machine instruction or instruction sequence. Moving the compiler to a new machine simply requires 
rewriting the macro definitions. 

The major difficulty with the abstract machine approach to portable software is in determining the 
appropriate abstract machine. If the abstract machine is of a high level (i.e., very problem-oriented), then 
the program will be very portable but the implementation of the abstract machine will be difficult. On the 
other hand, if the abstract machine is of a low level (i.e., more machine-oriented), then, unless it 
corresponds closely to the target machine, either the code produced will be inefficient or the 
implementation will be complicated by optimization code. 

The solution to this difficulty proposed by Poole and Waite is to define a hierarchy of abstract machines, 
ranging from a high-level problem -oriented abstract machine to a low-level, machine-oriented, and easy- 
to-implement abstract machine. In this solution, the higher-level abstract machines are implemented in 
terms of the lower-level abstract machines, and only the lowest-level abstract machine need be 
implemented on a target machine in order to transfer the program; once it is transferred, higher-level 
abstract machines may be implemented directly in terms of the target machine in order to improve 
efficiency. While this technique may be useful for transferring particular programs, it is unlikely that it 
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will be acceptable in practical terms as a compilation technique because of the need for additional 
translation steps. An experiment by 8rown [12] indicates that one may implement and then optmww a 
low-level abstract machine in about the same time as it takes to implement a, high e r to vol obstroct 
machine and that the resulting implementations are staHerly efficient. Thus an oHpr nottvo eolution is to 
use a low-level abstract machine, but ajtew the Heplementer to op tb ai io a* d oai r eo i this ootuUon ie more 
likely to be acceptable a* a compilation technique. A third solution wW bea dveceted in this p aper . 

The technique of rewriting the generation phase requires that a non-trivial translator from the internal 
representation to the target machine lenguege be written for each new target machine. SimiiaHy, the 
abstract machine approach requires that a translator from the abstract machine language to the terge* 
machine language be written for each new target machines U reatonably efficient code' is de sir e d end the 
abstract machine does not correspond very ctesety to the target machine, Men this translator wiH obo be 
non-trivial. 

A more desirable goal for a portable coupler is that it have a generation phase which can be modified to 
produce code for a new target machine by a process which is largely automatic. Implicit in this goal is 
the requirement that the modification process obtain ito lMX»vkKiie about a target mecWne f#cH«( a (non- 
procedural) description Of the machine. An early effort in thia *«ection was the StAlfG syatem[ 131 
which attacked the problem Of describing a machine-dependent process (code generation) in a machine- 
independent way. In the SLANG system, source language constructs are translated into a set of basic 
operations called EMU** the EWttU are translated into absolute machine coda using macro definitions and 
instruction format cfcfinttjon^.TJmepen^^ 

can be considered fq be the instructions of an abstract menhinat the d iffe r ence is that the code 
generation algorithm uses formation centamed in e maahwe eeacripiion in or dar to teitor the EMIL 
program to the target machine. The EMU differ from -the instructions of * Poote and Waite abstract 
machine in that they are mediine^ofienteo^#atJier .than problem (eource-langueia) oriented, in addition, 
the code generator does not seem to know about register* other than index ngistere* which im plie s that 
one will not be able to achieve the desired dose correspondence between the abatraei mechine and moat 
register-oriented machines. Nevertheless, the method of describing the instructions of a machine by 
providing simple instruction sequences which interpret the abstract emehine instructions eeem* to be a 
good compromise between the desire to mjmmize cod and the difficulty of ma th e mat i ca lly defining a 
machine and utilizing sucti ie definition m generating code. 

More recently, Miller [14] has explored the problem of constructing a code generator from a mechine 
descrtptioa Mffler proposes that a generation phase be constructed in two steps. fa|the first step, the 
language designer specifies the language-dependent pert of the generation phase by writing a set of 
procedural machine-independent macro definitions for the operations pf the internal representation 
produced by the analysis phase. These mecro definitions define the operation* of the interne! 
representation, such as addition, in terms of machine-tndapendent (La, iar^uage-rorieflted) primitives, such 
es integer eddition, which are Greeted by the language designer. In the second step, the impfementer 
provides a description of the target machine which ia used by an automatic code generation system 
named OMACS (Descriptive Metro System) m order to fill out the mecro definitions of the Hret step and 
thereby produce a code generator for the target machine. As was the case with the SLANG system, the 
DMACS mechine description defines the primitive operations by giving terget machine cede sequ e n ces 
which interpret them. In addition, however, the parmittad loc«tiom of the operands (in terms of their 
being in memory or in particular registers) are specified as are the corresponding result tocationev Thue 
the primitives can be mode to correspond very closely to the instructions of the target mechine so thet 
the code sequences in the machine description are simp^ and the res^tirei object code is more efficient. 

Both the SLANG system end OMACS are intended to be general in that they ere not designed for a 
specific source language. However, true genejggty is difficult to obtejso end the systems do reflect 
preconceived notions about source lenguiges, ,JjP be^ved ttert, since there are much more significant 
variations among languages than among a practical i m p lem e nt ation of a compil e r for any 

interesting lenguege require* tbet the system be eesigfed efmeifkeli* tor that woge. This idee wes 
recognized to some extent fri OMACS where the primitives are created by the lenguege designer ea 



convenient for expressing the operations of the source language. On the other hand, DMACS contains no 
notion of storage classes (different mechanisms for accessing variables of the same data type) which are 
needed for C; the implementation of storage classes is machine-dependent and thus must be defined in 
the machine description. In this paper, techniques similar to those used in the SLANG system and in 
DMACS are used in the implementation of a portable C compiler. 

1.3 Method 

The goal of this research is to design a generation phase for a C compiler which can be modified to 
produce code for many machines by a process which is largely automatic. Some insight into this problem 
can be gained by examining the corresponding, but better understood problem of the automatic 
construction of an analysis phase. One common approach is the use of a parser generator [15]. A parser 
generator is a program which accepts as input a grammar for a source language and produces as output a 
set of tables which are used by a language-independent parsing algorithm. The parsing algorithm is 
supplemented by a set of action routines which are provided by the implemented these action routines 
are called by the parsing algorithm at appropriate points to produce the output of the analysis phase. 
The important characteristics of this process are as follows: 

1. The analysis phase is divided into two parts, a language-independent part (the parsing algorithm) and 
a language-dependent part (the parsing tables and the action routines). 

2. The language-dependent tables are constructed automatically from a finite description of the language 
(the grammar). 

3. The analysis phase is "filled-in" by the implementer by providing information in a procedural form (the 
action routines). 

4. The choice of a specific parsing algorithm determines the class of languages which can be handled by 
the analysis phase. 

The process of constructing an analysis phase can be made more automatic through the use of a compiler 
writing system. In a compiler writing system, the action routines are in a sense built-in; the implementer 
invokes these action routines from a higher-level description of the translation. The use of such a system 
may involve much less effort than would be required to write a complete set of action routines. However, 
the important point here is that the use of built-in knowledge, as opposed to allowing the addition of 
arbitrary procedural knowledge, restricts the class of translations (and thus source languages) which can 
be handled by the automatically generated analysis phase. 

For the compiler described in this paper, techniques analogous to those described in the preceding 
paragraph are used in the implementation of the generation phase. The generation phase is split into two 
parts, a machine-independent part and a machine-dependent part. The machine-independent part of the 
generation phase is a machine-independent code generation algorithm, corresponding to the language- 
independent parsing algorithm of the analysis phase. Just as the choice of a particular parsing algorithm 
limits the class of languages that the analysis phase can handle (the parsing algorithm is not completely 
language-independent), the choice of a particular code generation algorithm determines the class of 
machines for which the compiler can produce reasonable (non-interpretive) code. The machine-dependent 
part of the generation phase consists of a set of tables produced automatically by a stand-alone program 
GT (Generate Tables) from a machine description, which corresponds to the grammar in the construction of 
an analysis phase. The information contained in the machine description may be supplemented by a set of 
routines which correspond to the action routines of the analysis phase. However, the compiler described 
in this paper is closer to the compiler writing system approach in that implementer-supplied routines form 
only a minor part of the generation phase. The extent to which the implementer can easily and safely 
include such routines in the generation phase represents another factor determining the class of target 
machines handled. 
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A code generation algorithm, if it is to be machine-independent, requires a mode) of a machine with which 
to work. This model may express such notions as memory, registers, eddressing, operations, end 
hardware data types. In the machine description, the implementer defines hie target machine in terms of 
this model end also specifies the form of the object language. Ths cless of machines for which the cede 
generator can produce acceptable code directly corresponds to the generality of the machine model. 

The machine model used by the C compiler is a C machine: a machine whose registers and memory are 
described in terms of the primitive C data types and whose operations are primitive C o pe r a t i on s. The 
implementer models the target machine in terms of * C machine* pfodusing en a ea trs s t eiocitino. The 
abstract machine may be vry, similar to or very different from the target machine^ depe n d ing Mpon how 
closely the target machine his the machine model. The coos geeeratien etf^ithm. using Ms meehwe 
model, produces code for the ebstrecj machine. The "assembly" language of the objteect nochine is ceded 
the intermediate lenguege; an intermediate language program, which is in the form of a series of macro 
calls, is translated into the target machine assembly lenguege using a set of macro definitions, provided by 
the implementer in the machine description. Assea * was chosen ojpr machine lenguege lor 

the output of the compiler because it is far easier to describe end produce in a mechine-independent 
manner than machine code or object modules. 

The abstract C machine plays the seme role in the C compiler as would a Poole end Waite abstract 
machine. The difference is that instead of there being one fixed abstract machine, there is e cless of 
abstract machines, corresponding to the variability in ihe machine model. Thte v a r iab il i t y eUo ws the 
implementer to define a particular abstract machine which more closely resembles Ms target mechine. 
The result is that the translation from the abstract machine language to the target machine lenguege 
becomes simpler, end more efficient code is produced 

The process of modeling the target mechine is described in chapter two. . A detailed discussion of the 
code generation algorithm is presented in chapter three. Conclusions ere presented' in chapter four. 
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2. Modeling the Target Machine 

The coda generator's model of e machine is an abstract C machine, I inathine who»e Instructions perform 
the primjfive operations of tha C language. The data types of the abstract machine are the prbretive C" 
data types Characters, integer*, and single- and dot^-precision floating pointX supplement©^ by one or 
more pointerctasses which are distinguished by their ability to resolve addresses. The 'ieUe addressable 
unit of the abstract machine memory is the byte, which hoMs a single character value (characters are the 
smedest G data type). Values of tha other abstract macHme datatypes occupy in integral number of 
bytes, possibly aligned in larger units of memory. The abstract m e th i ne - h a s a set of registers which may 
be used to hold the operands of the abstract maelino lnaH^e«W». ^taliF*ltrext machine registOr is 
capable of holding value* of soma subset of tre abstract mechim dete typev The Jmtroctkmt of the 
abstract machine are three-address instructions. Each address may sp e cify an abstract machine register 
or a location in •memoryj the mechanisms for ref er en c ing i r MiWy t%0mm : 4oim*pim6*ta the primitive 
addressing modes in C 

In the machine description, the implementer describes the target machine in terms of this machine model 
by defining a particular abstract machine for which the coda generator c^odBces intermediate code. The 
implementer specifies the sizes and alignments of the primitive C data types and defines pointer classes 
es convenient. The implementer defines the abstract machine r eg ittersj 'Which generally correspond to 
those registers of the target machine which are to be used m tt*e eveioetiori Of expressions. The 
implementer also specifies the registers which may hold' ?»etoja>ef4»eee#aof the ebstrect machine data 
types. In addition, the implementer may specify that any two abstract mach i ne regwters conflict in the 
terget macWnOk meaning that only one may hold a value at any one tiam y T»e li ft pm '" *' ' **'' define* ihe 
abstract machine instructions in terms of their operand/result locations and possible Side-effects on other 
registers. In addition, tha imp to m a rd o r provides e set of metre oa f tu i t io ns which Implem e n t the ebstrect 
machine instructions on the target mach i n e. 

2.1 The Intermediate Language 

The intermediate language is the assembly language of the abstract m e chin a. Using the information 
contained in the tables constructed from Hie machine dstuiul lo n , traveodb generator produces e 
translation ef the source program in the - jntormedrnto lai mo oge An- Int ei medHtO language program 
consists of a sequence of macro caMs, each of which is e >ce e n i f d into one or WOrO object language 
statements using the macro definitions provided in the machine description. There ere two types of 
macros in the intermediate language: The first type are '***&*&& Aim** the M***dtim— 
ebstrect machine instructions. The second type ere Keyw o rd matros which correspond to either 
assembly-language pseudo-operations or instructions im plem e nt ing the primitive C control structures. 

2.14. Abstract Ma e ht ne Instruotion* 

The ebstrect machine instructions are treeo^eddress instruchom which perform the evaluation of C 
expressions. The operators of the abstract machine irstrucflons ; 'ero caned ebstrect machine operators 
(AMOPs), the addresses are ceNed references fj^Fs). 

2.1.1.1 AMOPs 

AMOPs are basic C operations which are qualified by the spbcfNc abstract machine data type* of their 
operends. For example, in the M8-CG00 implementation there ere four AMOPs corresponding to the C 
operator V : 

♦i integer addition 

*d douWe-preeiston floating-point addition 

+p0 addition of an integer to a pointer to a byte-aligned object 

♦pi addition of an integer to a pointer to a word-aligned object 
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In addition, there are AMOPs for data movement, data type conversion, and conditional jump*. AMOPs are 
represented in the compiler as an integer opcode with a value from to 285. The various AMOPs ere 
listed in Appendix II. 

2.1.1.2 REFs 

A REF is a C-or tented description of the iocsMon of an operand or the result of an abstract machine 
instruction. A REF may specify either a register of the sbstreet machine or a location in memory) the 
possible classes of memory references indues C variables of various storage classes (automatic, static, 
external, parameter, temporary) as weft as constants and indirect refe r en ce s. A REF is r spr ase nt od by e 
pair of integers celled REF.BASE end REFjQFFSET, REFUSE eetommes either s particular register or e 
pertkuler close of memory references, SEFXJFf SET determia i e the sued location given s specific me m o r y 
reference class. The possible values of REFBASE ere listed below with their interpretetione (actual 
integer values ere shown for coneretenessi the compiler itself uses manifest constants): 

REF.8ASE Interpretation 

n 2 - register en (register numbers are assigned to the registers of the abstrect 

machine in a predictable mann er by GT) 
-1 - »n m^ow^ic or teanwsry v«r isbie; OFFSCT is ths offset (>f tr« varisble in the 

stack frame 
-2 - an external variable, referenced by names OFFSET is an internal identifier 

number 
-3 - a static {internal) variable; OFFSET is an internal static variable number 

-4 - a parameter; OFFSET is the offset of the variable or its address in the 

argument list 
-5 - s label; OFFSET is an internal label number 

-6 - an integer constant whose value is OFFSET 

-7 - a floating-point constant; OFFSET is an internal constant number 

-8 - a character siring constant; OFFSET is an interr^ string number 

n s -9 - rafaranca indirect through e pointer in register • (-n - 9* OFFSET is the offset 

of trm reference relative to the pointer 

The specific values of REF.BASE need not be referred to in most macro definitions; the exception is the 
NAME macro, which converts a REF into a symbolic address. 

The representation of a three-address instruction in the intermediate language is that of a macro call with 
five or seven integer arguments representing the AMQP and REFs for the result and the operands of the 
AMOR (Each REF consists of two arguments, REF.BASE and REF.OFFSET; only two REFs are provided in 
the case of a unary AM0P-) Tl»niac» name used in the macro call is of • special form which specif ies an 
entry in a table produced from the machine description by the QT programt this table entry refers to the 
representation of the corresponding macro definition from the machine description. 

2.1.2 Keyword Macros 

Keyword macros are those macro calls which, along with the three-address instructions, make up an 
intermediate language program. Unlike AMOP macros whose names are genereted by ST, the nemos of 
the keyword macros are predefined, as are their functions. For example, keyword macros are used to 
define external variable names and internal labels; to specify initial values in storage, and to produce the 
function prologs and epilogs. The various keyword macros defined in the intirmi rtials lirqnsgs v Mated 
below along with a brief description of their function* a more complete set of descriptions appears in 
Appendix in. - 
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macro 



function 



HEAD produce header statements, if needed 

ENTRY define an entry point 

EXTRN define an external reference 

INT define an integer constant 

CHAR define « character constant 

FLOAT define a floating-Hpc4nt constant 

NFLOAT define a negative floating-point constant 

DOUBLE define a double-precision float constant 

NDOUBLE define a negative double-precision constant 

ADCONn define • class V pointer constant 

STRCON define a pointer referencing a string constant 

EQU define a symbol 

ZERO define an arse of atorsga tttistized to zero 

STATIC define a static variable 

STRING define the string constants 

ALIGN force an alignment of the location counter 

LN define a line-number symbol 

LABCON define a label constant 

LABOEF define an internal label 

IDN translate an internal identifwr number 

into the corresponding ■i si m b tsr symbol 

END produce an end statement, if needed 

PROLOG produce the prolog coda of a C function 

EPILOG produce the epHog«ode of a C function 

CALL produce a function call 

RETURN produce code for a return statement 

GOTO produce a jump to a label expression 

LSWITCW produce a switch Jump (hat version) 

TSWITCH produce a switch jump (table version) 



The actual macro names which appear 
names listed above. 



in 



an intermediate language program ere abbreviations of the 



2.2 The Machine Description 

The machine description is a "program" written in a special-purpose language from wNch is constructed 
the machine-dependent tables of the generation phase. The machine description has two functions: (1) it 
defines the particular abstract machine for which the code generator produces Intermediate code, and <2) 
it specifies the translation from an intermediate language program to the corresponding object language 
program. 

The abstract machine is defined in two sections of the machine description. First, a set of definition 
statements defines the registers and memory of the abstract machine. Second, in the OPLOC section, the 
AMOPa are defined in terms of the* operand/result locator*. The translation from the intermediate 
language to the object tanguage is specified by a set of m.CTO definitions in the macro section of the 
machine description. More information on the wri«ngOf*f fnschfee desetttffcm may be found in Appendix 
I; the machine description used in the MS-6000 implementation is listed in Appendix IV. 

2.2.1 Defining the Abstract Machine 

In the machine description, the implementsr first defines the registers of the abstract machine. For 
example* the statement 
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regnames (xO,xl,x2,x3,x4,a,q,f); 

defines the eight abstract machine registers used in the HIS-6000 implementation. The registers XO 
through X4 correspond to the first five of eight HIS-6000 index register*, the A end Q correspond to the 
accumulators, and the F register is a fictitious floating-point a tcunutatw which corresponda to the 
combined A, Q, and E (exponent) registers on the WS-6000. The feet thatUw f regWor o»nfH£t* in the 
target machine with the A and Q registers is specified by the statem e nt 

conflict (a,f),(q,f); 

The remaining HIS-6000 index registers are not represented in the abstract machine since M was not 
desired that they be used by the code generator m the evatuetion of expressionsf two of those registers 
hold "environment pointers," the other is used as a scratch register by some of the macro definitions. 
There is nothing that requires that the abstract machine segieters be J mpt e m s ntod es actual machine 
registers on the target machine; they may also be implemented es fined memory toctttons. 

For convenience, the abstract machine registers can be gathered into daeeest for example, in the WS- 
6000 implementation, the statement 

class x(x0,xl,x2,x3,x4), r<a^h 

defines the class of index registers X and the case of general registers ft 

The implementer also defines the classes of abstract machine pointers. Pointer classes are necessary on 
machines which are not byte-addressed since pointers to by te sEg ned objects wW be h and l ed differently 
than pointers to word-aligned objects. In the WS-6000 wsrniws deetrtptien, the statement 

pointer pO(l), pl(4)s 

defines the class PO of byte pointers and the class PI of word pointers. The "4* indicates thet the value 
of a PI pointer is always a multiple of four bytes, "fibs fact that there are four bytes per word on the 
HIS-6000 is specified in the statement 

size l(char), 4<int,float), 8(doubte); 
A similar statement is used to specify the alignment restrictions. 
The statement 

type int(r), char(r), float(f), double* f), pOfr), pl(x); ' 

defines the registers which can hold values of each of the abstract machine data types. For example, in 
the HIS-6000 implementation, word pointers are held in the index registers X while byte pointers ere held 
in the general registers R. 

The definition of the abstract machine is completed in the OPLOC section of the machine description 
where the implementer specifies the, behavior of the abstract machine operations in terms of their 
operand/result locations. For example, the location definition 

+d: f,M,f; 

specifies that the AMOP '+d* (double-precision floating-point addition) can take its first operand in the F 
register and its second operand in any memory location and, under these circumstances, the result is 
placed in the F register. The construct on the right in the location definition is catted en OPLOft it 
consists of three location expressions, one for the first operand, second operand, and result (reeding from 
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left to right). A location expression may specify any set of abstract machine registers or any set of 
memory reference classes; for example, the location expression 

r|x 

represents the set consisting of the general registers R and the index registers X, and the location 
expression 

~ intlit 

represents the set consisting of all memory reference classes except that of integer constants. An OPLOC 
may specify that the result is placed in the first or second operand location. For example, the location 
definition 

+i: r,M,l; 

specifies that the AMOP *+i' (integer addition) takes its first operand in a general register and its second 
operand in any memory location, and the result is placed in the register which contained the first 
operand. This location definition is equivalent to 

+i: a,M,a; q,M,q; 

which explicitly lists the two alternatives. An OPLOC may also specify that the contents of certain 
registers are destroyed during the execution of an AMOP; for example, the location definition 

*i: q,M,q [aj 

specifies that an integer multiplication destroys the contents of the A register. 

2.2.2 Defining the Object Language 

The translation from the intermediate language to the object language is specified by a set of macro 
definitions included in the machine description; macro definitions are provided for the abstract machine 
instructions and the keyword macros. The simplest form of a macro definition is a single character string 
which is substituted for the macro call during macro expansion. For example, the macro definition for 
floating-point unary minus used in the HIS-6000 implementation is 

-ud: " FNEG" 

This macro definition specifies that each occurrence of a '-ud* abstract machine instruction is to be 
translated into the assembly language instruction TNEG" which complements the contents of the F 
register. The macro definition for '-ud' is closely related to the location definition for '-ud', 

-ud: f„l; 

which states that the operand is found in the F register and that the result is placed in the F register. A 
macro definition for an AMOP can assume that the actual operand/result locations appearing in an 
abstract machine instruction satisfy the constraints specified in the corresponding location definition; at 
the same time, a macro definition must produce correct code for all combinations of operand/result 
locations allowed by the location definition. 

A macro definition for an abstract machine instruction can refer to symbolic representations of the 
operation and the operand/result locations by using the character sequences «0 (operation), «F (first 
operand), *S (second operand), and »R (result). These character sequences are abbreviations for calls to 
an implementer-defined macro which converts an AMOP opcode or a REF into the desired object language 
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representation. For example, the macro definition for *♦}* (integer addition) in the ttiS-6000 
implementation is 

+i: " ADeR «S" 

W the first operand location (which is also the result location) is the A register and the second operand it 
an external variable "X", then the code produced by this macro definition is 

ADA X 

which adds the contents of "X" to the A register. A macro definition can else contain character strings 
whose Inclusion in the expansion of • macro calf it edftdftenet upon the locations of the operands end/or 
result. An example is the W8-«00 macro definition for *<<• deft shift) 

«: 

Untlit,): 

(,~intlit,): 

which produces different code sequences depending upon whether or not the second operand (the 
number of bit-positions to shift) is an integer constant. A macro definition may include references to the 
arguments of the macro call using the character .sequences s0, e^ ,.-»,*% e macro definition stay include 
•mbedded macro celts, such a* the "le^Sf in the last example, which returns the value of the integer 
constant. 

A macro definition may also be specified in the form of a C routine. C routine macro definitions are used 
when processing is needed which is beyond the capabilities of the simple macro scheme so far described. 
C routine macro definitions may define global variables, perform arithmetic and logical operations, and 
select code sequences on conditions other than operand locations, in the preee*it implementation, 
however, C routine macro definitions are unable to interact with the code feneration algorithm. In the 
HIS-6000 implementation, € routine macro definitions are used tp translate -JEFs into GMAP symbols, to 
translate the source language representations of identifiers end tint constants into GMAP, to 

define character string constants, end to buffer characters while defining storage for veriebles (GMAP 
does not have a byte location counter, as h assumed in the mtermediete language). The C routine macro 
definitions used in the HIS-6CO0 implementation ere listed m Appendix V. 
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3. Generating Code for an Abstract Machine 

The most interesting part of the compiler is the code generator since, unlike most code generators which 
produce code for a fixed target language, the code generator of the C compiler is designed to produce 
code for a class of abstract machines. 

3.1 Functions of the Code Generator 

The code generation process consists of three fairly distinct functions. First, there is the generation of 
intermediate language statements to define and initialize static data areas and constants. Second, there is 
the translation of source language control structures into labels and branches. Third, there is the 
translation of source language expressions into sequences of abstract machine operations. 

The C compiler is designed to produce assembly language code for conventional machines; thus, the 
intermediate language statements for defining and initializing static data areas directly correspond to 
assembly language statements which define symbols, define constants, and align the location counter. The 
only complication is that the code generator must use the size and alignment information from the machine 
description in order to specify the sizes and alignments of data areas. More information and redundancy 
could be added to the intermediate language in order to accomodate a larger class of target languages; 
see [16] for examples. Another possible improvement would be to emit segment specifying instructions 
so that the output could be segregated into different segments according to whether it is code, pure data, 
impure data, or uninitialized data. 

The process of translating source language control structures into labels and branches is rather 
straightfoward. The only complications come when emitting conditional branches which test the value of 
an expression; these problems are covered in the next section. 

3.2 Generating Code for Expressions 

The generation of code for expressions is the most difficult part of the problem. The code generator 
must generate a correct sequence of abstract machine instructions to carry out the indicated operations. 
The operand and result locations it specifies in the abstract machine instructions must conform to the 
location definitions provided in the machine description. Moreover, the code generator must Keep track of 
the locations of all intermediate results and correctly administer the abstract machine registers and 
temporary locations. 

The generation of code for expressions is performed in two steps, semantic interpretation and code 
generation. 

3.2.1 Semantic Interpretation 

The code generator receives expressions in the form of syntax trees whose interior nodes are source 
language operators and whose leaf nodes are identifiers and constants. Thus, an expression can be 
considered to consist of a "top-level" operator along with zero or more operand expressions. The first 
step in the processing of an expression consists of translating a tree in this form to a more descriptive 
form whose interior nodes are AMOPs. This translation involves checking the data types of operands, 
inserting conversion operators where necessary, and choosing the appropriate AMOPs to express the 
semantics of the source language operators. The selection of an AMOP to replace a source language 
operator is based primarily on the data types of the operands. For example, on this basis, an addition 
operator may be translated into either integer addition, double-precision floating-point addition, or one of 
a number of pointer addition AMOPs. However, it is useful to be able to choose AMOPs also on the basis 
of what is provided in the machine description. The basic idea is that of defaults. If the semantics of a 
particular AMOP can be expressed in terms of a composition of more basic AMOPs, then the AMOP can be 
left undefined in the machine description; the code generator can use the equivalent composition of 
AMOPs instead. The advantage of having optional AMOPs is that the implementer need define one of 
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these optional AMOPs in the machine description only if his definition will result in sufficiently better code 
than will be produced using the equivalent composition of more basic AMOPs. 

An example of this technique is the handling of a class of C operators called assignment operators. An 
example of an assignment operator is '-+*, where 1 -+ R* is defined to be the same as "L - L ♦ R" except 
that the expression L is evaluated only once (it may contain side-effectsX Consider en expression 
"L -op R." If the corresponding abstract machine assignment operator is defined in the machine 
description, then the source language assignment operator is translated Into that abstract machine 
operators otherwise, the expression T. «op rT is converted to the equivalent form "L - L op R", except 
that there is only one copy of V having two pointers to it (a flag is set in the root node of 1" so that 
later routines will recognize this tactX Therefore, a particular abstrad machirw assignment operator need 
be included in the machine description only if the code sequences it generates »* better then the code 
that would be generated by the equivalent assignment expression. An example from the HiS-6000 
implementation is the abstract machine operator '-«+i* (integer adtftion-assignmetit) which is trenstated 
into an add-to-storagp i The correapoj nt atiignmont operetor '— nT is net 

defined in the machine description since no floating-point add4o~ttarege instruction exiats on the 
mechine. 

Other examples of optional AMOPs which have been implemented are the pointer comparison operators 
for pointers other than class W pointers (the default is to convert to the *gp*etest common denominator" 
pointer class for which the operation is implemented) and the test for null/non-null pofeter operators (the 
default is to convert the pointer to an integer and test for equeHty /inequality with OX Other promising 
candidates for being optional AMOPs are the various ifKj r s e i ant and da wr aeVa ^ AMOPa, 

3.2.2 Code Generation 

The second step in the processing of an expression is the generation of a sequence of abstract machine 
instructions to carry out the evaluation of the expression. This code generation is performed by a set of 
recursive routines, some of which will be described in ttts section. The operation of the code generation 
routines is basically top-down. When a call is made to generate code to evaluate u\ expression, e set of 
desired locations for the result of that evaluation is also specified. This specification, along with other 
available information about the operand* of the top-level operator of the expression, is used to choose 
one of the OPLOCs from the top-level operator's location definition in the machine description {location 
definitions are described in section 2.2.1). From the chosen OPLOC and, possibly, the desired 4ocat ions for 
the result of the expression are derived sets of desired locations for the operands of the top-level 
operator. Recursive calls are then made to generate code to evaluate the operands into these desired 
locations. Next, an abstract machine instruction is emitted for the top-level operation. Finelly, if 
necessary, abstract machine instructions are emitted to move the result of the expression to en 
accept able location. 

3.2.2.1 Specifying Desired Locations 

A set of desired result locations is specif led by a structure called a LOG. A IOC structure has two integer 
members, LOC.FLAG and LOCWORa The possible values of UX^UG ere listed below along with their 
interpretations: 



%; 
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LOC.FLAG interpretation 

the "result" is the internal label specified by LOC.WORD (used only for 
conditional jump AMOPs) 

1 the result is to be placed in a register; acceptable registers are specified by 
one-bits in LOC.WORD (bit corresponds to register number 0, etc.) 

2 the result is to be placed in memory; acceptable classes of memory references 
are specified by one-bits in LOC.WORD (this field is used only to select registers 
for pointers in indirect references) 

3 the result may be left in any location acceptable for values of the particular 
data type 

Note that a particular memory location is never specified as the desired location for a result; rather, 
classes of possible memory locations are specified. 

For convenience, if the LOC passed to the top-level code generation routine specifies that the result is 
desired in a register, then all registers not capable of containing the particular data type of the 
expression being evaluated (as defined in the TYPE statement of the machine description) are removed 
from the LOC. Similarly, if the LOC specifies memory reference classes, then all indirect classes where the 
pointer register is unable to hold pointers of the corresponding pointer class (as specified by the TYPE 
statement) are removed from the LOC Thus where the code generator simply desires that a value be in a 
register, it may provide a LOC specifying that the result may be left in any register. 

The removal of "impossible" registers from a LOC is not performed when such an action would leave no 
remaining acceptable registers; this situation can actually occur in certain special cases, such as return 
statements, where an operation requires a value in a register not normally used to hold values of that 
type. 

3.2.2.2 TTEXPR 

The top-level code generation routine is TTEXPR. The function of TTEXPR is to generate a sequence of 
abstract machine instructions which will evaluate a given expression and leave the result in an acceptable 
location, as specified by a LOC parameter. The operation of TTEXPR begins with the removal of 
impossible cases from the LOC parameter, as described above. Then, TTEXPR passes the expression and 
LOC parameters to a routine CGEXPR, which generates abstract machine instructions to evaluate the 
expression, using the LOC parameter as a non-binding indication of preference. Finally, TTEXPR calls the 
routine CGMOVE to emit, if necessary, abstract machine instructions to move the result to an acceptable 
location. 

3.2.2.3 CGEXPR 

The function of CGEXPR is to generate a sequence of abstract machine instructions which will evaluate a 
given expression. CGEXPR is given a LOC argument which specifies preferred locations for the result of 
the expression; however, unlike TTEXPR, this specification is non-binding and is used only where a choice 
exists. 

The operation of CGEXPR consists basically of testing for a set of special cases and then performing the 
appropriate action, which is usually to call another routine which does the real work. The first special 
case is where the expression node is shared and the expression has already been evaluated; in this case, 
no action need be taken. Another special case is where the top-level operator is a conditional AMOP and 
a value is desired (as opposed to a jump, which is the usual case); in this case, a routine JUMPVAL is 
called to emit the desired code. The other special cases involve particular top-level operators: 
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indirection, assignment, conditional expression, function call, and the "leaves" of the expression tree, 
identifiers and literals; in these cases, the code generation routine corresponding to the particular top- 
level operator is called. Finally, in all other cases, the routine COOP is called to emit code to evaluate the 
expression. 

3.2.2.4 CGOP 

The function of CGOP is to emit code to evaluate an expression whose top-level operator is not one 
special-cased by CGEXPR. Like CGEXPR, CGOP is passed a LOC indicating non-binding preferences for the 
location of the result of the expression. 

The operation of CGOP is performed in six steps. First, a routine CHOOSE is called to select an OPLOC 
from the top-level operator's location definition in the machine description. Second, desired locations for 
the operands of the top-level operator are determined Third, a routine EXPR2 is called which makes 
recursive calls on TTEXPR to emit code to evaluate the operands into the desired locations. Fourth, code 
is emitted to save any registers which are specified in the machine description to be clobbered by the 
execution of the top-level operator. Fifth, the exact location of the result of the expression is 
determined. Sixth, the actual abstract machine instruction for the top-level operator is emitted. 

If the result location specified by the LX parameter is a label, or if the selected OPLOC specifies that the 
result is left in the first or second operand location, then the exact location of the result of the 
expression is fixed. Otherwise, a particular register must be chosen from the set of registers specified in 
the result field of the OPLOC (the compiler is currently unable to handle OPLOCs which specify a set of 
memory references as the location of the result). In the search for a result register, the priorities are as 
follows: first, free registers which are preferred result locations; second, busy registers which are 
preferred result locations; third, free registers which are not preferred result locations; and fourth, busy 
registers which are not preferred result locations. If a busy register is selected, register contents are 
saved in temporary locations as necessary. 

For the purposes of finding a result register, a register containing an operand is considered free and a 
register containing a pointer to an operand is given lowest priority. A register containing a pointer to an 
operand is protected because the implementation of a AMOP may alter the contents of the result register 
before the operand referenced by the pointer in that register is used. An example is the following HIS- 
6000 code for the AMOP '+pl* (addition of an integer to a pointer to a word-aligned object): 

LXLO I 

ADLXO P 

This code loads index register with the integer I and then adds to register the pointer P. (The code 
for the AMOP includes the load instruction since in general integers cannot be stored in the HIS-6000 
index registers as they are only halfword registers.) If the code generated for P leaves P referenced 
through index register 0, the load instruction will "clobber" register before P is accessed by the add 
instruction: 

LXLO I 

ADLXO 0,0 

However, if index register is protected, index register 1 will be chosen instead to hold the result, 
producing the following correct code: 

LXL1 I 

ADLX1 0,0 



-21- 

3.2.2.5 Selecting an OPLOC 

The purpose of OPLOC selection is to select a set of operand/result locations for the top-level operator 
of an expression by choosing one of the OPLOCs from the location definition of the operator in the 
machine description. The choice of operand/result locations will affect the amount of code produced to 
evaluate the expression, both because of different code sequences which may be produced by the macro 
definition for the operator and because of additional loading, storing, and saving operations which may be 
required in order to set up the operands and move the result to an acceptable location. A general 
solution, taking into account atl possible locations of operands and results, is a complex optimization 
problem. Instead, a more limited approach has been taken which uses the provided preferences for 
result locations and available information about the possible result locations of the top-level operators in 
the operand subexpressions. For example, if an operand is an identifier, then its location is known to be 
a memory reference of a particular class. Similarly, various operators may be defined in the machine 
description to always place their result in one of a particular set of registers. Using information of this 
sort, plus knowledge about the current register usage, a rough estimate can be made of the number of 
additional load and store instructions which will be required for each OPLOC in the location definition; 
from the set of OPLOCs, the one with the lowest additional cost is chosen. 

For example, consider the expression "I + (J / K).' (For clarity, source language operator symbols are 
used in this example to represent the corresponding integer abstract machine operations.) Assume the 
following location definitions (the OPLOCs are numbered for future reference): 



r,r,l; 


(I) 


rMl? 


(2) 


M,r,2; 


(3) 


rU.l [r2J 


(4) 


r2,r,l [r3J 


(S) 


r3,r,l [r4J 


(6) 


rl,M,i [r2]; 


(7) 


r2Ml [r3J 


(8) 


r3,M,l [r4J 


(9) 



Here M represents all memory reference classes and r represents a set of general registers consisting of 
rl, r2, r3, and r4. The division operator is modeling a machine instruction which produces pairs of results 
(the quotient and remainder) in adjacent registers. For the division abstract machine operator, only the 
quotient is used; the other register is considered to be "clobbered" by the execution of the operator. 
Note that one can deduce from these location definitions that both operators always leave their results in 
general registers. 

The generation of code for the expression "I + (J / K)" begins with the selection of an OPLOC from the 
location definition of the V operator. In this case, all of the OPLOCs specify the same set of result 
locations (the general registers); thus, the desired locations for the result of the expression does not 
affect the choice of OPLOCs. Instead, the choice is made on the basis of the possible locations for the 
operands. In this case, the first operand is a variable I which is known to be a memory reference of a 
particular class. The second operand is the result of a division operator which is known to leave its 
results in either rl, r2, or r3. On this basis, OPLOC (3) is chosen because no extra operations are needed 
to move the operands into acceptable locations, whereas both OPLOCs (I) and (2) do require such extra 
operations. 

Next, a recursive call is made to generate code to evaluate the subexpression "J / K." The desired 
locations for the result of this expression are those specified by the chosen V OPLOC for its second 
operand, namely r, the set of general registers. However, since the '+' OPLOC specifies that the second 
operand location is also the location of the result of the V operator, the intersection of that location set 
with the set of desired locations for the result of the '+' operator is used instead, if that intersection is 
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non-null. Thus, the following factors are used in selecting an OPLOC for the '/* operator: first, which of 
the possible result registers >(rt, rZ, t?)are desired result locations; second, which of the possible result 
registers are free; and third, which of the "cloi*e^ Vi 'l|gHrtai» ipg, ,r3, fl c# '«n». .1 ej*. 4fi Ibis particular 
situation, tho possible location of the iirsi operand (J) is * memory reference and thuedoea.fipt leuor any 
of the OPLOCs. However, the second operand, which ii also known to be a memory reference, fevers 
OPLOCs (7>, <8K and W. 

In addition, when selecting an OPLOC from a location definition, certain OPLOCs may be rejected entirely 
because they specify conditions which can not be met. For example, if an OPLOC specifies (either directly 
or indirectly through an operand location) that the result is left in a register but the result it desired in 
memory, then that OPLOC wilt be rejected if a temporary Jocation is not acceptabte- . Hp ; OHLQC is 
rejected because, given a value in a register* the only general method by which the code ..generator can 
make that value into a memory reference is by saving it in a' newly aHocated temporary location. (Recall 
that a specific memory location is not provided for the result, only a eat of acceptable memory reference 
classes.) Similarly, if the result will be in memory and is desired in maejpry, then that OPLOC will be 
rejected if there are one or more possible result memory reference classes which ere not acceptable 
result locations; this is done because the code generator is not capable of transforming a memory 
reference from one class to another. Similar checking is performed on the operand location specKicatiorts 
in the OPLOC: if an operand is required by the OPLOC to be in memory but not atf non-indirect memory 
reference classes are allowed, then that OPLOC will be rejected if tno o p e r a nd Operator** not guaranteed 
to place its result in an acceptable memory location or If it can place its result hi a register but 
temporary locations are not acceptable. These restrictions oNew a location definition to contain extra 
OPLOCs which apply only in special cases since such OPLOCs wM never be chosen unices the special 
cases hold. 

An example of how the OPLOC selection method can be utilized in the writing of a machine description is 
the following definition of the '*pV AMOP (addition of a integer to a pointer to a word-aligned object) 
taken from a hypothetical HIS-6000 machine description (thf de sc ribed OflfiC eoiectioo method was not 
implemented at the time the actual WS-6000 machine description wet written), the shortest code for 
executing the '+pl* operation in the general case is 

LXLO I 
ADLXO P 

where I is the integer in the low-order half of a word in memory and P is the pointer in the hqd**ordor 
half of a word in memory. The result of this operation is left in an index registers thus the OPLOC for this 
code sequence is 

However, if both the integer and the pointer must be computed into registers (which occurs frequently in 
referencing elements of an array), the integer and the pointer must flrst.be stored into temporary 
locations before this code sequence can be applied. Therefore,, using the given code s equence under 
these circumstances results in excessive Object code. The desired coda is 

ALS 18 

STA TEMP 

ADLXO TEMP 

which shifts the integer in the general register into the high-order hejfword, stores it into a temporary 
location, and adds it to the pointer in the index register. The OPLOC for this de sequence is 

x/.lj 
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In the case where the pointer is in an index register and the integer is a constant "n", then the desired 
code is 

EAXO nfl 

with an OPLOC of 

x,intlit,l; 

The described OPLOC selection method allows all three OPLOCs to be included in the location definition for 
'+pl\ In particular, it guarantees that the third OPLOC will never be selected unless the second operand 
is an integer constant. 

3.2.2.6 Generating Code for Subexpressions 

After an OPLOC has been selected, CGOP calls a routine EXPR2. to make recursive calls on TTEXPR to 
generate code to evaluate the operands of the top-level abstract machine operator. The LOC arguments 
passed to TTEXPR in these calls are taKen from the operand fields of the selected OPLOC and, in the case 
of operators which place their result in an operand location, the desired locations for the result of the 
top-level operator. If there are two operands, EXPR2 makes sure that the two operands will not require 
the use of the same register (for example, by using a register to hold both one operand and a pointer to 
the other operand); this is done by checking the LOCs for "overlap" and removing certain possibilities. In 
addition, EXPR2 evaluates first the operand which is more complicated on the basis of the sizes of the 
subtrees for the two operands; this tends to reduce the number of saving and restoring operations 
performed. In the course of generating code to evaluate an operand of a binary abstract machine 
operator, it may be necessary to use the register containing the already computed value of the other 
operand or a pointer used to reference it, in which case code is generated to save the contents of this 
register in a temporary location. Thus, after generating code to evaluate both operands, EXPR2 calls a 
routine RESTORE to generate code, if necessary, to restore the saved value to its original register. 

3.2.2.7 Register Management 

The status of the various abstract machine registers with regard to register allocation is contained in an 
array of structures called REGTAB. Each element structure of the array represents the current state of 
one abstract machine register. An element structure consists of two members: UCODE, an integer 
indicating the current use of the register, and REP, a pointer to the subexpression tree whose value is 
currently in the register. The possible values of UCODE are listed below with their interpretations: 

UCODE Interpretation 

the register is free 

-1 the register contains the value of the expression pointed to by REP 

-2 the register has been marked "do not use unless necessary" for the purpose of 

finding a register for the result of an AMOP; although the register contains a pointer 
to one of the operands of the AMOP, it is free in that it may be selected as a last 
resort without having to save its contents. 

n>0 the register does not directly contain a value, but there are "n" conflicting registers 
containing values which must be saved before this register can be used. 

The routines used in register management are described below: 
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CLEAR<R) 

ECLEAR(E) 
FREEREGKW) 

GETREG<W1,W2) 



MARK(E) 

NBUSY(W) 
NFREE<W) 
RESERVE(R£> 

RESTORED 



SAVE<R) 
UNMARK(E) 



Register R, which must directly contain the value of an expression, is made 

available for use; its current value is not saved. 

The register associated with the expression E, if any, is CLEARed. 

A register from the set specified by W is made available for use; the 

contents of registers are saved if necessary. 

If possible, an unmarked register from the set Wl is made available for 

use. Otherwise, if possible, an unmarked register from the set W2 is made 

available for use. Otherwise, a marked register from the set Wl is mode 

aveiiabJe for use. Within each set, froe 

to busy registers; if a busy register is ch o s e n, its 

If the expression E is an indirect reference, the register io»0; o1 n i ng the 

pointer is marked "do not use unless necessary." 

Return the number of busy register* in the set W. 

Return the number of free registers in the set W. 

Register R is aitooated to hoW the velue of the expression E. Register R 

must be avaiiaWe tor use. 

If the value of the expression E (ore pointer in the ease of en indirect 

reference) has been saved in a temporary location, it is r e stor e d to the 

original register. 

Register R is mads available for use by saving the contents of whatever 

registers are necessary. 

Undo .MARK. 



The following is a typical series of cads made by CG0P in the generation of 
whose top-ievel operator is a binary operator with o p era nds OR end 0P2* 



code for an expression E 



0PL0OCH0G8ECEMJC) 
EXPR2(0P1,0P2) 



choose an QPLOC 

recursively generate code to evaluate 
the operands into acceptable locations 



ECLEARtOPl) 
ECL£AR(0P2) 



make operand registers avaHabb for 
the result 



SAVE(») 



save "clobbered* registers, if any 



MARK(OPl) 
MARK(0P2) 



mark registers used to hold pointers 
to operands 



R-GETREG(*,*) 

UNMARK<0P1) 
UNMARK(0P2) 



select a result register 
unmark any marked registers 



RESERVE(R,E) 



reserve result register 



3.2.2.8 Possibilities for Failure 



The code generator can fail in two ways: (1) it can reach an impossible situation end announce a compiler 
error, and (2) it can unknowingly generate incorrect code. Examples of impossible situations are (1) 
discovering that there are no acceptable OPLOCs in the location definition for an Operator, (2) bemg told 
that the result must be placed in a register from the empty set of registers, and (3) disc ov ering thet an 
essential location definition or macro definition of an abstract machine operator was not provided by the 
implemented The most likely cause of a failure is an incorrect machine description. Examples of errors 
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which can be made in the machine description are (1) an QPUQC specifying that both operand* must be in 
the same register, <2) an OPLOC specifying a set of memory reference classes for the result location, (3) a 
macro definition containing errors, and (4) a macro definition whtehdoes not anticipate e particular 
operand or result location, or combination thereof, allowed by or otherwise 

essential (in the case of move operations which must be Capable oj i^r* am£ng registers and between 
registers and memory). Some of these errors could ba detected by Oat program which processes the 
machine description <GT). Another possible cause of faili**!* an abstract machine with an insufficient 
number of register* Such a machine may require that"* register be used to hold both a pointer to an 
operand and the result of an operations as described above,- this situation may result in incorrect code. 
Hopefully, abstract machine models of real machines will not suffer from this proWem. Of course, the 
other possible cause of failure is a bug in the code generator itself. It would be interesting and useful if 
such a code generation algorithm could be proven correct, given sensible restrictions on the machine 
description and the essumptton of correct mscro definitions. 
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4. Conclusions 

This paper has described the implementation of a portable compiler lor the programming language C The 
compiler was first implemented by the author in a seven month period on the fcW Laboratories Computer 
Science Research Center's R 1/46 X system. The compter wet then used to compile tteeW, end the 
resulting code moved to the IfiS-ttUO. Another month was *P«nt debugging, the co mp rt a r until the 
version of the compiler compiled on the WSHW&O sue ipUed itself. This wee regarded ae a 

significant test of the compiler. 

4.1 The Compiler 

The major problem with the compiler itself is its speed. The compiler appears to be more than twice es 
slow as other compilers for similar source languages. This slowness is due almost entirely to the use of a 
macro expansion phase (a phase not likely to be present m ordinary compilers), since the compiler tends 
to spend half or more of its time in the macro expansion phase. The slowness of the compiler seems to 
be a problem inherent in the chosen compiler structural no amount of mere receding is likely to 
significantly reduce the percentage of time spent in the macro expansion phase. One approach toward 
improving the speed of the compiler would be to eliminate non-essential processing such as the 
construction and interpretation of character-string representations of macro calls and the rescanning of 
macro definitions. The macro language could be modified so that the result of the expansion of a macro 
call would never be needed as an argument to another macro call and thus could be printed directly, 
rather than returned as a string and rescanned. Given this restriction, the macro definitions could be 
compiled into procedures which simply print strings and caH other procedures. These procedures could 
be called directly by the code generator; alternatively, they could be caNed by a procedure which 
interprets a suitable encoding of the intermediate language. 

A second problem with the compiler is its sin, in terms of both the amount of file space necessary to 
support an implementation, of the compiler and the amount of memory required to execute the compiler 
phases. The source of the compiler is about 25GK characters, the source of GT is about 80K characters; 
thus, the file space required for source, object libraries, and executable files is on the order of 1M 
characters. Only the size of the code of the code generator is a result of designing the compiler to be 
portable; it is likely that a code generator designed for a specific machine would be much smaller. Other 
reasons for the large size of the compiler stem from the particular programming techniques used. In 
particular, keeping the entire tree representation of a function in core at one time during code generation 
requires that a large block of storage be reserved. Also, the use of a bottom-up table-driven LALRtl) 
parser seems to result in a larger syntax analysis phase than would result from using recursive descent, 
as does the IR4IX C compiler. The large size of the compiler limits the number of computer systems which 
can support the compiler. 

Despite these problems, it is believed that were one prepared to make the investment necessary to 
implement C on another machine, the size difficulties and related costs would be outweighed by the 
relative speed with which one could bring up a working implementation. One could then concentrate on 
making it more efficient, having the advantages of a C compiler to work with and the ability to program in 
C 

The least flexible machine-dependent component of the compiler is the code generation algorithm. It is 
acknowledged that a clean mechanism for allowing the implemenler to tailor the code generation algorithm 
through the addition of procedural knowledge would be an improvement. On the other hand, clinging to 
the idea that the code of the compiler will m<m be touched is unrealistic. A likely prospect for 
modification is the code related to the calling sequence since it may be desired to use a system standard 
calling sequence instead of the one built into the compiler. Another problem which would be solved most 
easily by modifying the code generator is the IBM S/360 addressing problem. Because a S/360 
instruction cannot contain an arbitrary memory address, C external variables must be referenced by first 
loading a register with e pointer to the variable (an address constant) and then using the register es a 
base register in the actual instruction. These actions couU be performed by the macro definitions using 
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conditional expansion; however, it would be easier to modify the code generator to handle this particular 



case. 



The most direct method of moving a portable compiler based on a machine description requires access to 
an existing implementation of the compiler. The process of moving a compiler written in its own language 
from machine A to machine B is as follows: First, one writes a machine description for machine B. 
Second, the machine description is used by a construction program running on machine A to produce a 
new compiler which produces code for machine R Third, the compiler on machine A is used to compile 
the new compiler, producing a compiler which runs on machine A but produces code for machine B. 
Fourth, the new compiler is used to compile itself, producing a compiler which runs on machine B and 
produces code for machine a This process is called a half bootstrap. On the other hand, the Poole and 
Waite approach does not require the use of an existing implementation. One need write only an 
interpreter or a translator for a very simple abstract machine language in order to move a program to a 
new machine. This technique is called a full bootstrap. In practice, the need for a half bootstrap often 
represents a significant obstacle to moving a program. 

The full bootstrap method can be used to move a portable compiler based on a machine description as 
follows: Initially, a simple imaginary machine is defined as a vehicle for bootstrapping. A compiler which 
runs on and produces code for this imaginary machine is then constructed using the half bootstrap 
method described above. Now, in order to move the compiler to a new machine, one implements an 
interpreter for the imaginary machine on the new machine. This action results in an "existing 
implementation" of the compiler, running on the new machine, which can then be used to carry out the 
half bootstrap as described above. 

4.2 The Compiled Code 

Although there are weak spots, the code produced by the compiler is good considering that it is almost 
completely unoptimized. It is certainly better than would be produced if the abstract machine were the 
typical machine-independent abstract machine with one accumulator and one index register, given the 
same complexity of the macro definitions (they do not perform register allocation). Such an 
implementation would not be able to take advantage of the HIS-6000's two accumulators or the multiple 
index registers, nor would it recognize the fact that byte pointers cannot fit in the index registers. 

One of the weak spots in the compiled code concerns floating-point operations. The code generator 
performs" all floating-point operations in double-precision, issuing single-to-double conversion 
operations before using single-precision operands. It is unable to utilize the HIS-6000 machine 
instructions which operate on a single-precision operand in memory and a double-precision operand in 
the F register. Since the implementation of a single-to-double conversion is to load the single-precision 
operand into the F register, very poor code is produced for single-precision floating-point expressions 
(as opposed to very good code for double-precision expressions). One way to handle this situation would 
be to implement a general subtree-matching facility for optimization. With such a facility, the implementer 
specifies in the machine description that a particular combination of abstract machine operators (specified 
in the form of a tree) is to be replaced by the code generator with a new abstract machine operator; the 
new operator is defined by the implementer in the machine description just like any of the built-in 
operators. In the floating-point case, one would specify that a subtree of the form (using a LISP-like 
notation) 

( double-prec-add ( el , single-to-double ( «2 ) ) ) 

would be replaced by 

( single-prec-add ( «1 , «2 ) ) 

where single-prec-add is a new abstract machine operator which would be defined to be the "FAD" 
instruction. This method of subtree-matching can be compared to the hierarchy of abstract machines 
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method in that the new abstract machine operators can be considered to be instructions of a higher-level 
abstract machine. The differences are that, in the case of the subtree-matching method, the definition of 
higher-level operators is optional (thus there is no multistage translation when optimization is not desired 
or needed) and that the implementer defines the higher-level operators to suit his needs. The subtree- 
matching approach to machine-dependent code optimization has been investigated by Wasilew [17]. 

Another weakness in the compiled code concerns array subscripting. Instead of placing the offset of an 
array element into an index register and performing an indexed memory reference, the code generator 
adds the offset to a pointer to the base of the array, producing a pointer (in an index register) which is 
then used to reference the array element. Thus, the code generator regards index registers only as base 
registers to hold pointers, and not as index registers to hold offsets. One reason for not implementing 
the capability of using index registers for subscripting is that this method of subscripting is often not 
possible. For example, on machines like the HIS-6000 with single-indexed instructions, this method can be 
used only for external and static arrays; all other arrays require the use of an index register just to 
reference the base of the array. (Actually, one can perform double-indexing on the HIS-6000 by using 
an indirect word; however, this was not recognized at the time the compiler was written.) The capability 
of using index registers for subscripting could be implemented using the subtree-matching facility 
described above; one would test for subtrees of the form 

( pointer-add ( address-of ( extern | static ), <any> ) ) 

and replace them with a new abstract machine operator which would be defined to produce the desired 
code. A more satisfying solution would give the code generator more knowledge about addressability so 
that it could use index registers for subscripting whenever possible, based on information given in the 
machine description. 

A third weakness of the compiled code is the use of indirection. The code generator only indirects 
through pointers in registers; it is unable to utilize an indirection-through-memory facility (except through 
a specific location which implements an abstract machine register). Again, a better understanding of 
addressing is what is really needed. 

4.3 Summary of Results 

This paper has presented a technique for the design of portable compilers and has demonstrated its 
practicality through the implementation of a portable C compiler. The main difference between this work 
and the previous work described in section 1.2 is that in this work, the system was designed specifically 
for the language being implemented; it is this restriction which contributes most to the practicality of the 
approach. In addition, this work has emphasized the concept of a machine-dependent abstract machine, 
thus tying together the work on portable compilers and program transferability. 

The advantages of the technique presented in this paper over the technique of rewriting some or all of 
the generation phase are (1) that the implementer can modify the compiler to produce code for a new 
machine with less effort and in less time, and (2) that the implementer can be more confident in the 
correctness of the modifications. Almost the entire code of the generation phase, already tested in the 
initial implementation, is unchanged in the new implementation. This code includes the code generation 
algorithm, the register management routines, and the macro expander. Furthermore, the modifications 
which must be made are localized in two areas, the machine description and the C routine macro 
definitions. The implementer is primarily concerned with the correct implementation of the individual 
abstract machine instructions. The interaction among these instructions, in terms of their correct ordering 
and the use of registers and temporary locations, is handled by the code generation algorithm and need 
not be of concern to the implementer. It is this reduction in the complexity of the problem which leads 
to the increased confidence in the results of the modification. 

The portability of the compiler has been tested by the construction of version of the compiler for the 
DEC PDP-10. The initial machine description and macro definitions for the PDP-10 implementation were 
written and debugged by the author in a period of two days. 
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4.4 Further Work 

There are three main directions for further work. One is to develop iwechine models which will allow the 
generation of acceptable code for a larger class of machines. Such machine models will have the effect of 
reducing the complexity of the descriptions of machines which do not completely correspond to the 
machine model described In this paper. With rbe MS-SOCK), for example, tr» orrfy major area of 
complexity in the machine description is that of character manipulation. One, would desire a machine 
model which allows the implements to describe more con v e nf et ttfy the Implementation Of characters on 
his machine. Similarly, a machine model which allows a better understanding of addressing would be 
desirable. 5 

Another direction for further work is to develop macWne-independpnt code generation algorithms which 
will produce mere efficient code. In particular, th problem of ''rVgjtW Ifecalion under complex 
constraints should be examined. In addition, ts<hnlquM f or l^tewfrf the MUjIementoT to extend easily end 
safely the code generation algorithm through the addition of procedural knowledge should be developed. 
Such techniques should allow the compiler to be modified to produce code for unanticipated new 
machines. 

The third direction for further work is to apply the technique of portable compilers to more complicfted 
and more powerful languages. The technique of using a fi fle generation algorithm 

and a machine description, even aside. from Vi pbf1NibJ^ code 

generator. It would be interesting to see if this technique could reduce the complexity of code 
generators for large languages and Whether pbrtaWflty '"tsitfff *tm* be obtained Without destroying the 
efficiency of the object code. 
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Appendix I - The Machine Description 

The for offeror the- machine description is described in detail inlhe following iectior*. Examples are taken 
from-the HfS-6000 r description given in Appendix W In -*» atf%10 explain the p&oee p« 

writes wechiheoe^riptton wWcWwlH result in ttw cesfr ad cdos b itbr. 

The convention of writing syntactic alternative on separate tinea it used throughout. 

1. Definition Statements 

The machine description begin* with a series of definition statements, fhtae definition atatementa ere 
described in the sections below in the order in which they should appear in the machine description. 

1.1 The TYPKNAMBS Statement 

The TYPENAMES statement defines the names which are used in the mathia description to represent the 
primitive C data types: character, integor, floating-point, and double- iting-eoint The form of 

the TYPENAMES statement is ^^ *~*m-*~ 

<typenemesjstmt>: typanames ( <rtame Jist> ) j 

<nameji$t>: <name Jist> , <name> 

<name> 

The first name corresponds to tha internal type number 0, the second with type 1, etc Because the 

internal type numbers are fixed in the compiler, the TYPENAMES atafemeht should always be (equivalent 

to) ■ .-■■■.<■, ,■-,- 

typenames (char, int, float, double); 

1.3 The REGNAMBB Statement 

The REGNAMES statement defines the names of tha abstract machine register these registers ere 
assigned internal register numbers (used in REF3ASE, section Zl.12), stertlre: with muster number 0. in 
the order in which they appear in the REGNAMES statement. The MiSWiiMMtS statement is 
similar to that of the TYPENAMES statement; for example, the REGNAMES statement used in the WS-6000 
implementation is 

regnames (xO, xl, x2, x3, x4, x5; a, q, i>, 

In this example, all but the F register correspond directly to actual registers on the HIS-6000: registers 
XO through X4 are the first five (out of eight) index registers, n A end t} ere the two 

accumulators. The F register is a fictitious floating-point accumulator which in reality corresponds to the 
combined A, 0, end E (exponent) registers. The fact that the F register conflicts with the A and Q 
registers is specified in the COPa^ICT statement, described below. Only tbota actual machine registers 
which are to be used by the code generator in producing code to should be included 

in the REGNAMES statement; registers used only for environment pofnte fry address calculations, 

or other scratch calculations performed within the code for • *mgfe AMOP should not be included in the 
REGNAMES statement. For example, on the HIS-600©; three index registers are hot defined In the 
REGNAMES statement: X7, which contains a pointer to the current stack frame, X6, which contains a 
pointer to the current argument list, and X5, which is used as a scratch nigWir^by AMt3*s Which access 
characters. 
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1.3 The MEMNAMBS Statement 

The MEMNAMES statement associates rmtm with the various classes oi memory reference* m specified 
by negative values of REF.BASE G 1.1^ The form of J > CMW4C< a tatew e nt Kg****** that 

of the TVP6NAMES statement; for example, tha MEMNAWES itsta me nt used In the 18*4000 
implementation is 

memnames (rag, auto, ext, stat, param, label, intlit, floatlit, stringlit, ixO, ixl, 1x2, 1x3, ht4, ia, iq)i 

The first nine names refer to predefined memory reference classes (REF.8ASE - 0,-1,-2, - ,->•& the 
remaining names refer to indirect references through tha abstract m e chin o registers defined In the 
REGNAMES statement (REF.BASE - -9,-10, .. ). The first name "rag" it never used; it serves only es e 
placeholder. No name is provided for indirect references threi Mf^dOdf ^Mff' *le#tor ** 

not used to hold pointers and, being tha highest numbered register, omitting it does net affect the 
positions of tha other names in the list. 

1.4 The SIZE Statement 

The SIZE statement defines the sizes of tha primitive C data types in terms of bytes. The tone of the 
SIZE statement is 

<size_stmt>: size <size_def Jist> $ 

<size_defjist>: <s«aj*tUi»t> , <*izejdef> 

<size_daf>: <integer> { <typejist> ) 

<type Jist>: <type Jist> , <type> 

<type> 

The integers specify sizes in bytes; tha types are the names of primitive Q data types (at specified k» th» 
TYPENAMES statement) with the corresponding size. For example, the SIZE st atem e nt used fn the HIS- 
9000 implementation is 

size l(char)^Kint,float)3(dQublah 

All addresses computed by the compiler are in terms of byte addressing; byte addresses are converted to 

word addresses for non-character operations by the macro definitions. For example, on the WS-6000, if 
the first element of an integer array begins at offset m tha static area, then s ub seq uen t Mmm*** ** 
the array are at offsets 4, 8, 12, 16, etc. 

1.5 The ALIGN Statement 

The ALIGN statement defines the alignment factors of the primitive C data typesj these alignment factors 
are in bytes. The (byte) address of a variable with an alignment factor V must be zero modulo V; tor 
example, on the HIS-6000, the (.bytp) address of an intfaer must be a multipJa of 4 An alignment factor 
must be divisible by all smaller alignment factors; this allows the compiler to assign a dd resses relative to 
a base which satisfies the highest alignment restriction. Tha foist, of tha AUGN »ta> a want 4a aJadler to 
that of the SIZE statement: for example, the ALIGN statement used in the WS-*000 Impl e m e ntat ion is 

align l(char),4<int,flo«t)3(doubla)i 

1.6 The CLASS Statement 

The CLASS statement is an optional statement which allows the implementer to define classes of abstract 
machine registers which are used in similar ways; the register classes so dafined can then be used in the 
machine description as abbreviations for the corresponding lists of registers. The form Of the CLASS 
statement is 
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<class_stmt>: class <class_def Jist> ; 

<class_def Jist>: <class_def Jist> , <class_def> 

<class_def> 
<class_def>: <name> ( <registerjist> ) 

<registerjist>: <register Jist> , <register> 

<register> 

The name is the name of the register class, the registers are the names of the abstract machine registers 
(as specified in the REGNAMES statement) which make up the corresponding register class. The CLASS 
statement used in the HIS-6000 implementation is 

class x(x0,xl,x2,x3,x4), r(a^>, 

This statement defines the class of index registers X and the class of general registers R. 

1.7 The CONFLICT Statement 

The CONFLICT statement is an optional statement which allows the implementer to specify abstract 
machine registers which conflict in the actual implementation. The form of the CONFLICT statement is 

<conf lict_stmt>: conflict <conflict_defJist> j 

<conflict__defJist>: <conflict jdefjist> , <conflict_def> 

<conflict_def> 
<conflict_def>: ( <register> , <register> ) 

Each register pair specifies two abstract machine registers such that only one of the registers can be in 
use at one time. The CONFLICT statement used in the HIS-6000 implementation is 

conflict (a,f), (q,f)s 

which indicates that the F register conflicts with both the A and Q registers. 

1.8 The SAVE ARE ASIZE Statement 

The SAVEAREASIZE statement is used to specify the size of the save area which is reserved at the 
beginning of each stack frame. The save area is generally used for saving registers upon entry to a 
function, for chaining stack frames together, and for holding other per-invocation information. The form 
of the SAVEAREASIZE statement is 

saveareasize <integer> ; 

The integer specifies the size (in bytes) of the save area. The save area used in the HIS-6000 
implementation is 16 bytes (4 words) long. 

1.9 The POINTER Statement 

The POINTER statement defines classes of pointers according to their resolution; these pointer classes 
represent different implementations of pointers on the target machine. The resolution of a pointer 
corresponds to the alignment factors of the objects to which it can refer; in particular, a pointer with a 
resolution of "n" bytes can refer only to objects whose alignment factors are multiples of "n" bytes. The 
primary use of pointer classes is on machines whose smallest addressable unit is larger than bytes; in this 
case, two pointer classes are defined: one which can resolve only machine-addressable units and another 
which can resolve individual bytes. By defining separate pointer classes, the implementer allows 
computations involving pointers which are known to refer to machine-addressable units to be performed 
in terms of machine-addressable units, and therefore more efficiently. The form of the POINTER 
statement is 
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<pointer_stmt>: pointer <pomter„,def Ji*t> j 

<pointer_defJi$t>: <pointer jdef Jist> , <pointer _def> 

<pointer.jdef> 
<pointer_def>: <name> ( <integer> ) 

The names define the names of the pointer classes, the integers are the resolutions of the corresponding 
pointer classes. At least one and no more than four pointer c l a N e iaw y bo t ^fiiw d i ttiee« pglwtor cto iiea 
are referred to as PO, P1.P2, and P» In the specif ication of the AMOPs. 

The POINTER statement used in the WS-6000 implementation is 

pointer pOU), pl(4){ 

PO is the class of pointers to byte-aligned objects; PI is the class of pointers to word-aligned objects. 
Word pointers can be held and operated upon in the index regjitom byWfwijtiofi am operated upon in 
the general registers and indirected through by subroutine. 

1.10 The OFFSBTBANaE Statement 

The OFFSETRANGE statement is an optional statement which d efi n e s, for aech pointer ctaaa defined in the 
POINTER statement, the range of offsets permitted m refereneee indireet v*a such apeinlor (eee eection 
2.1.1.2). The form of the OFF^TRANGE statement is 

<offsetrange_stmt>: offsatrahge <0ffsetjdef Jist> ; 

<offset_defJist>: <oftseU!ef Jist> , ^effseUto^ 

.. <off*et -) Oef> . ,..., 
<offset_def>: <pointerjclass - name> ( <loj>ound> , «W_bound> ) 

where the lo_pounds and hijjounds are optional integers. Each offsetjdef specifies the range of 
allowable offsets for a particular pointer dees* this range :« tha let c^ integer it not teas than to Jftotwd 
and not greater than hijiound. if a bound is not present, then the range » corajidofod u nbo u n d ad in the 
corresponding direction. If no range is specified for a fainter de«vn»« 0«V tf » e Offsefc ere etdwedi 
any specified range must include zero. 

1.11 The BBTUBNRBa Statement 

The RETURNREG statement specifies in which registers functions returning values of various types return 
those values. Registers must be specified for types INT and DOUBLE as weH aa for att pointer classes 
defined in the POINTER statement. The form of the RETURNREG statement is 

<returnregjstmt>: roturnreg <retunufrUist> j 

<return_defjist>: «return_def Jist> , <return_def> 

<retum_def> 
<return_def>: <register> ( <typejist> ) 

The types may be names of primitive C data typas « defined in the TYPENAMES statement or names of 
pointer classes as defined in the POOfTJER aUtementj the corresponding raflster is d s h no d to be the 
register in which functions returning value* of those types will piece the returned veJuea. for OKsmpIs, 
the r?ETURNREG statement used in the WS-S000 impja w e ntat ion is 

returnreg q(intrf>0j9iVf (double)? 

It is advised that pointers of att classes ,pe roturned in the same regiater in a compatible form to avoid 
aurora caused % mmmetclioe %^ 
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1.12 The TYPE Statement 

The TYPE statement defines which registers sre to be used in the evaluation of expressions to hold 
values of the various abstract machine data typo*. Tha form of tha TYPE Statement if 

<type^stmt>: type <type_defjist> ; 

<type.jdefjist>: <type_def J«t> , <type_jjef > 

<type_def> 
<type_def>: <type> ( Register Jist> ) 

The type is the name of a primitive C data type as defined in the TYPCN^ES statement or the name of a 
pointer class as defined in the POINTER statement} the Wars are the abstract machine regMers or 
classes of abstract machine registers which may be used to hoW values ttf.the corresponding type, For 
example, the TYPE statement used in the H1S-W00 impli is 

typechar<r),int{r),float(fWouble<f)#0(r)#l(x)} 

The registers specified in the TYPE statement need not include every .register physically capable of 
holding a particular type; only those registers which the i mp i em a nt ar desires to use in evaluating 
expressions jMhat type should be included in the TYPE statement fe Jbs WS-6000 example, only the 
index registers'^) are specified for. tha class P* n thou gene/el registers (R).ere 

capable of holding such pointers and, in fact, a general register (the star) is used to hold such e 

pointer when returned by a function cat); this we* dona in ordbr to i ceeeary use of the 

general register* which art relatively faw In number. 

2. The OPLOC Section 

In the OPLOC section of the machine description, tha AMOPs are defined, in terms of the possible locetions 
of their operands and the corresponding locations of their result*. EeeJ>definition consists of a list of 
triples called OPLOCsj an OPLOC specifies a particular eft <rf fire* operand locations, »econd operand 
locations, and result locations. An OPLOC may atso specify tha| ore or mors registers ere clobbered by 
the execution of the code for an abstract machine instruction! this Wane* the code generator that U may 
be necessary to emit instructions to save the content* of tha ctobbered registers before emitting the 
abstract machine instruction. The forms of an OPLOC are 

<loc_expr> , <Joc_pxpr> , <locjsxpr> i 
and 

<loc _pxpr> , <locjaxpr> , <toc_expr><ciobber> j 

where ■ el J*ber is a list of one or more register namas separated by commas and enclosed in square 
*^]*.. !™ ***** f*""*" $De ? if y >ocations fOr econd operand, and result, 

respectively. A location expression specifies eithar a set ot register! or a set of aory reference 
classes; 1hes# set* miy be specified using particular registers or memory reference classes along with 
the operations of union (T) and negation <v>. Tha syntax of a location expression is 
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<Joe_jBxpr>: 


<register_expr> 




<memory_pxpr> 
1 




2 

<null> 


<register_pxpr>: 


<registerjsxpr> | <register jMpr> 
* <register j»xpr> 
( <registerj»xpr> ) 
<regj*ier_name> 
«ri^i*ter jeiass_/tame> 


<memoryj»xpr>s 


<memory j§xpr> | <memory jaxpr> 




~ <memoryjBxpr> 
( <memory jixpr> ) 
<memory jref jclass_name> 

Li 




1*1 

indirect 



The negation operator '~' has precedence over the union operator T- The location expressions "1* and 
"2" may be used only for the location of a result; they specify that the resMlt is placed in the first or 
second operand location, respectively. Only the location expression for the second operand of • unary 
AMOP may be nuH. The tecettoR expression V represents the sat of aR memory reference clasa a si the 
location expression "indirect" represents the set of all indirect memory reference classes. 

The OPLOCs ere associated with AMOPs in location deftttHient which consist of one or mere AMOP labehv 
followed by one or more OPLOCs: 

<loc_def>: <AMOPJist> <QPLOCJist> 

<AMOPJist>: <AMOPjist* <AMOPJabel> 

<AMOPJebei> 
<AMOPJabel>: <AM0P> : 

<OPLOC Jist* <0PL0C_flst> <0PL0O 

<0PL0O 

Each AMOP in the list of AMOP labels is associated with the list of OPLOCs; each OPLOC in the list of 
OPLOCs represents an acceptable set of operand/result locations for each of the AMOPs. For example, 
the location definition 

+d:-d:*d:/d: f,M,f; 

used in the HIS-6000 machine description specifies that the AMOPs for double-precision floating-point 
addition, subtraction, mdtipHcatinn, and division all take their first operand in the F register, their second 
operand in memory, and place their result in the F register. Another example is the location definition 

««: -»: MAq; M^,a; 

which specifies that the AMOPs left-shift-assignment and right-shift-assignment both take their first 
operand in memory, their second operand in a general register, and place their result in the ether general 
register. A tNrd example is the location definition 

•i: /i: qMq[»i 

which specifies that the AMOPs for integer multiplication and division both take their first operand in the 
Q register, their second operand in memory, place their result in the Q register, end clobber the contents 
of the A register in the process. Note that the location definitions 
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rMl; 



and 



+i: 



rMr; 



A 



are not equivalent. The second definition allows the code generator to emit an abstract machine 
instruction which adds an integer in memory to an integer* »r end pieces the result in the Q 

register; the first definition requires that the result be placed in%e Jeflster coiMing the first operand. 

The OPLOC section of the machine description consists of a sequence of tocatton definitions which define 
the AMOPs of the intermediate language. (A small number of AMOPs should not be defined in the OPLOC 
section of the mac him description! those ere indicated in *jj bq< >db i 1I.> An AMOP may appear no more 
then once in the OPUX taction of the machine da w i pt i on . « 

3. The Macro Sootion 

The macro lection of the machine description contains the macro definitions for the AMOPs; these mecro 
definitions dxpand into the object -language statements n ee dddft to interpret the corresponding abstract 
machine instructions. A macro definition consists of a li«t of AM8P labewfelJowad by a list of character 
string constants. The list of AMOP labels specify that abstract *»ach»na Instruction* for these AMOP* Ire 
to be emitted as macro calls which refer to this macro definition. The character strings make up the body 
of the macro definition; they are written out in sequence e»tbe>e»piiadeii of e oeryeepondint macro cell. 
The character strings may have eptionai location profile* which tost for a sp e tt i t set of tocettem of the 
operands and result; a character string with an attached toso hoe p i of t x m met ud a d tn the expansion of the 
macro call only if the test specified by the location preta succeeds A tfcerstter strtr« may contain 
embedded macro calls and references to the arguments of the metro cad (see Appendix VI, section 4). 
The mecro definition for en AMQF must cor r es pon d to the toceben da tiittt o n <0>*ho>AMOP in thtt cdrfif£t 
code must be generated for all combinations of o por s od/r ased h xatlon i tfcit e>e atlowed by the location 
definition. '<---■«. 

The macro definitions can refer to the AMOP end the epereedjfresult locstiem by using the following 
abbreviations: 



abbreviation expansion meaning 

symbolic representation of operation 
symbolic representation of first operand 
■ eymholi& rep^e ad df i t f ^ 

symbolic representation Of result 
internal representation of operation 
internal rep t weMatt e n o f 1tNt epe i 'in d • 
internet representation Of second operand 

Recall that in the intermediate language representation df ah abstract machine instruction, the first 
argument of the macro call is the AMOP opcode, and the following arguments are RfFs for the result, first 
operand, and second operand <see section 2.1.1.2). The macro V is the iniplementer-defined NAME 
macro which can return any convenient symbolic representation for an operation or operand/result 
location; it is assumed to be implemented as a C routinigceHed ANAME (see Appendix VI, section 4). 



•0 


%n(eO) 


eF 


Xn<«3,*4) 


•S 


*de5>#) 


•R 


Xn(«l,e2) 


eX) 


•0 


e'F 


e3*«4 


e*S 


e5,eo 


e'R 


el,e2 



An example of a simple macro definition is the definition for integer addition used in the HIS-6000 
machine description. The location definition is 



■h: 



rMl; 
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and the macro definition is 

+i: " ADeR «S" 

This location/macro definition of the AMOP '+»' expands to product assembly-language statement* such as 

ADA X (external variable "X") 

AOQ ZfiL <Werai"3") 

AOA 0,2 (indiract through X2) 

ADQ 5,7 (an automatic or temporary) 

A more complicated macro' definition is used for the AMOP Mi' (move integer). This macro definition must 
be capable of generating code to mow an integer between a >meew*y ietatien and a general register or 
from one general register to tba other. Three character strings with location prefixes are used for. the 
three cases register-to-memory, memory-to-register, and i-agiater-te-registert 

.ii: 

(r w M)j " STeF eft" 

(M„r): " LDaR eT 

(v): " LLR 36" 

The location prefixes consist of location expressions for the first operandi second operand^ and result. 
The operand and result locations of a particular macro caH are ©oi^aia* to the iotratlon expressions in 
the location prefix- (cc^arisorawah a m# to«at^ mw» c o mpe r ls o n s 

succeed, the corresponding character string is ir«Judsd in the exparaOan 0f5tha«imKrt» caft. 

The macro section of the machine deeewphen may else define explicitly nomad macros} these may be 
keyword macros (sea section alia* og i wp j awwrtt ar-eefiwrd wacros which are caltsd in the definition* of 
other macros. A named macro is defined by using the name of the macro in piece of an AMOP in the 
label(s) preceding the body of the macro defintion. A single macro definition may have both AMOP and 
macro name labels; this is useful when it is desired that thedefletien of one abstract machine instruction 
itself contain another abstract machine instruction since the "internal* names used to refer to the macro 
definitions of AMOPs are net accessible t© the writer of the machine dascriphon. An exampia of a 
keyword macro definition in the HK-6000 machine ckweriphen is that for the CWPRV macro: 

en: " SVMREF at" 

The argument to the ENTRY macro is an assembler symbol as produced by the ION macro (see Appendix 
III). 

The macro section of the machine description consists of ths reserved word "macros" followed by a 
sequence of macro definitions. Macro definitions must be provided for most of the AMOPs of the 
intermediate language (exceptions ana indicated m Appawd ix H) and for aft of the keyword macros of the 
intermediate language which are not defined by C routines. An AMOP or a macro name may not be 
defined more than once in the macro sacttc« of the nw**me oaecriphen. 
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Appendix II - The Intermediate Language: AMOPs 

The operations of the abstract machine are represented in the intermediate language es three-address 
instructions} the operators of these instructions, catted tfa|,tr^ jeaffjjyja QHfgetort (AMQP*), are described 
in the tables below. For each AMOP is listed its opcode On octitX its symbbtic representation in the 
machine description, the types of its operands and result, and » description of the bask operation 
involved. The type entry consists of a list of types for the ikst operand, second operand (if any), and 
result of an AMOP, in that order; the types are taken from the fo ll owing Net of abbreviations: 



c 


character 


i 


integer 


f 


floating-point 


d 


double-precision floating-point 


X 


any type 


P 


any pointer 


PO 


class pointer 


Pi 


class 1 pointer 


P2 


class 2 pointer 


P3 


class 3 pointer 


1 


a location (the result of a jump) 



The following notes are referenced in the AMOP tables: 

1 - This AMOP should be defined only if the corresponding pointer classes are defined. 

2 - The definition of this AMOP is optional. 

3 - OPLOCs should not be specified for this AMOP. .';. 

4 - This AMOP is used only in ths tree rafn-eseotetion pf expressiom internal to the code 

generation phase: it should not appear in the mastt »iv 

5 - This AMOP causes a side-effect on 4. wtdeb must be an lvalue; 

therefore, ell OPLOCs for this AMOP must specify memory 'as the location of the (firat) 
operand. 
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Unary Abstract Mechino Operator* 




opcoda 


symbol 


typaa 


n**a 


KnpPVw. ^^^eWwae^^ff 


OOOO 


-uj 


W 




unary minus 


0001 


•Hid 


d,d 




unary minus 


0002 


++bi 


iii 


5 


preincrement 


0003 


++ai 


i,i 


5 


poet-increment 


0004 


~bi 


>ii 


9 


pro-decrement 


0005 


~ai 


M 


5 


pest-decrement 


0006 


.BNOT 


M 




bitwise negation 


0007 


• 


x." 


4 


truth-value negation 


0012 


.sw 


tf 




switch 


0013 


++bc 


c,i 


5 


pre-incremont 


0014 


++ac 


c,i 


5 


post-increment 


0015 


~bc 


C,i 


5 


pra-decrement 


0016 


— ac 


c,i 


5 


post-decrement 


0017 


&uO 


x,p0 




address of 


0020 


&ul 


x#l 


1 


address of 


0021 


&u2 


xrf>2 


1 


address of 


0022 


&u3 


x,p3 


1 


address of 


0023 


*u 


P.x 


4 


indirection 


0024 


— OpO 


pO,l 


2 


jump on null pointer 


0025 


«Opl 


PU 


1,2 


jump on null pointer 


0026 


— 0p2 


pV 


1,2 


jump on null pointer 


0027 


»0p3 


P3J 


1.2 


jump ©n nuH pointer 


0030 


!-OpO 


pO,l 


2 


jump On non-null pointer 


0031 


!-Opl 


PU 


1,2 


jump on non-null pointer 


0032 


!-0p2 


p£l 


■ W ■ 


jump on non-null pointer 


0033 


!-0p3 


PV 


1,2 


jump on non-nuH pointer 
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Conversion Abstract Machine Operators 



opcode 


symbol 


typos i 


wtos basic operation 


0040 


xi 


c,i 


convert c to i 


0041 


.cf 


e,f 


convert c to f 


0042 


xd 


c,d 


convert c to d 


0043 


.ic 


i»c 


convert i to c 


0044 


.if 


i.f 


convert i to f 


0045 


.id 


i,d 


convert i to d 


0046 


.ipO 


i,p0 


convert i to pO 


0047 


■ipl 


>rf>l 


i convert i to pi 


0050 


,ip2 


irf>2 


i convert i to p2 


0051 


.ip3 


•rf>3 


I convert i to p3 


0052 


.fc 


f,c 


convert f toe 


0053 


.fi 


f,i 


convert f to i 


0054 


.fd 


f,d 


convert f to d 


0055 


.dc 


d,c 


convert d to c 


0056 


.di 


d,i 


convert d to i 


0057 


.df 


d,f 


' convert d to f 


0060 


.pOi 


P<M 


convert pO to i 


0061 


•pOpl 


pOrf>l 


convert pO to pi 


0062 


.pOp2 


pOrf>2 1 


convert pO to p2 


0063 


.p0p3 


p0,p3 1 


convert pO to p3 


0064 


•Pli 


PU I 


convert pi to i 


0065 


•PlpO 


pl.p0 1 


convert pi to pO 


0066 


•Plp2 


plrf»2 1 


convert pi to p2 


0067 


•Plp3 


plrf>3 1 


convert pi to p3 


0070 


•P2i 


p2»i 1 


convert p2 to 1 


0071 


.p2pO 


p2rf>0 1 


convert p2 to pO 


0072 


•P2pl 


p2rf»l 1 


convert p2 to pi 


0073 


P2p3 


p2rf>3 ] 


convert p2 to p3 


0074 


.p3i 


p3,i 1 


convert p3 to i 


0075 


.p3p0 


p3,p0 ] 


convert p3 to pO 


0076 


.p3pl 


p3j>l 1 


convert p3 to pi 


0077 


•P3p2 


p3rf)2 1 


L convert p3 to p2 



Binary Abstract Machine Operator* 
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opcoda 


symbol 


types 


notes 


bane eparatian 


0100 


+i 


i,i,i 




addition 


0101 


-•M 


W 


2$ 


addition-assignment 


0102 


♦d 


d^d 




addition 


0103 


-+d 


*AA 


2$ 


addition-assignment 


0104 


-i 


W 




subtraction 


0105 


— i 


M,i 


2,5 


subtraction-assignment 


0106 


-d 


d<d,d 




subtraction 


0107 


-- d 


d,d,d 


15 


subtraction-assignmant 


0110 


*i 


W 




multiplication 


0111 


■*i 


i,i,i 


2,5 


multiplication-assignment 


0112 


*d 


d,d,d 




multiplication 


0113 


-*d 


dAd 


2,5 


multipiication-assigfmant 


0114 


/i 


i,U 




division 


0115 


-/i 


M,i 


25 


division-assignment 


0116 


/d 


d.d/1 




division 


0117 


-/d 


444 


2,5 


division-assignment 


0120 


% 


Mil 




modulo 


0121 


-% 


W 


25 


modulo-assignment 


0122 


« 


W 




left-shift 


0123 


m« 


W 


2.5 


left -shift-assignment 


0124 


» 


i.i»i 




right-shift 


0125 


•» 


ifii" 


2,5 


right -shift -assignment 


0126 


& 


W 




bitwise AND 


0127 


-& 


i,i,i 


25 


bitwise AND-assignment 


0130 


A 


i,i,i 




bitwise XOR 


0131 


-A 


i,i,i 


25 


bitwise XOR-assignment 


0132 


.OR 


W 




bitwise OR 


0133 


-OR 


W 


25 


bitwise OR-assignment 


0134 


&& 


x,x,i 


4 


truth-value AND 


0135 


.TVOR 


*#j 


4 


truth-value OR 


0136 


-pOpO 


p0,p0,i 




pointer subtraction 


0137 


- 


x,x,x 


4 


assignment 


0146 


+p0 


pO,i,pO 




increment pointer by 


0147 


+pl 


pUipl 


1 


increment pointer by 


0150 


+P2 


p2,ij»2 


1 


increment pointer by 


0151 


+p3 


p3>i,p3 


1 


increment pointer by 


0152 


-pO 


p0,i,p0 




decrement pointer by 


0153 


-pi 


plApl 


1 


decrement pointer by 


0154 


-P2 


p2,w»2 


1 


decrement pointer by 


0155 


-p3 


p3,i,p3 


1 


decrement pointer by 
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opcode 


symbol 


types 


notes 


bask operation 


0160 


.cc 


c,c 


3 


move character 


0161 


•ii 


i,i 


3 


move integer 


0162 


.ff 


f,f 


3 


move float 


0163 


.dd 


d,d 


3 


move double 


0164 


.pOpO 


pO,pO 


3 


move pointer pO 


0165 


•Plpl 


pl,pl 


1,3 


move pointer pi 


0166 


•P2p2 


P2,p2 


1,3 


move pointer p2 


0167 


.p3p3 


p3,p3 


1,3 


move pointer p3 


0171 


? 


x,x,x 


4 


conditional 


0172 


: 


x,x,x 


4 


conditional 


0200 


— i 


i,i,l 




jump on equal 


0201 


!-i 


i,i,l 




jump on not equal 


0202 


<i 


i,i,l 




jump on less than 


0203 


>i 


i,i,l 




jump on greater than 


0204 


<-i 


i,M 




jump on less than or equal 


0205 


>-i 


i,i,l 




jump on greater than or equal 


0206 


— d 


d,d,l 






0207 


!-d 


d,d,l 






0210 


<d 


d,d,l 






0211 


>d 


d,d,l 






0212 


<-d 


d,d,l 






0213 


>-d 


d,d,l 






0214 


«pO 


pO,pO,l 






0215 


!-pO 


pO,pO,l 






0216 


<pO 


pO,pO,l 






0217 • 


>pO 


pO,pO,l 






0220 


<-pO 


pO,pO,l 






0221 


>-pO 


pO,pO,l 






0222 


-pi 


pi,pl,l 


1,2 




0223 


!-pl 


pl,pl,l 


1,2 




0224 


<pl 


pl.pl.l 


1,2 




0225 


>pl 


pl,pl,l 


1,2 




0226 


<-pl 


pl,pl,l 


1,2 




0227 


>-pl 


pl,pl,l 


1,2 




0230 


— P 2 


p2,p2,l 


1,2 




0231 


!-p2 


P2,p2,l 


1,2 




0232 


<p2 


P2,p2,l 


1,2 




0233 


>p2 


P2,p2,l 


1,2 




0234 


<-p2 


p2,p2,l 


1,2 




0235 


>-p2 


p2,p2,l 


1,2 




0236 


-«p3 


P3,p3,l 


1,2 




0237 


!- P 3 


p3,p3,l 


1,2 




0240 


<p3 


p3,p3,l 


1,2 




0241 


>p3 


P 3,p3,l 


1,2 




0242 


<-p3 


P3,p3,l 


1,2 




0243 


>-p3 


p3,p3,l 


1,2 
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Abstract Machine Operators, continued 



opcode symbol typos 



0260 
0261 
0262 
0263 
0264 
Q265 
0266 
0267 
0270 
0271 
0272 
0273 
0274 
0275 
0276 
0277 



++bpO 

++apO 

—bpO 

—apO 

++bpl 

++apl 

-bpl 

— apl 

++bp2 

++ap2 

~bp2 

-«p2 

++bp3 

++ap3 

~bp3 

-~ap3 



pO,y>0 
pO,i,pO 
pO,i,pO 
pO,irf>0 
pl,irf>l 
pU.pl 
pl.i*l 
pU.pl 
p2^rf>2 
p2,i,p2 

P2^2 
P2^2 
p3,i,p3 
p3,i,p3 
p3J,p3 
p3,»,p3 



notes bask operation 

5 pre-increment by 

5 post-increment by 

5 pro-decrement by 

5 post-decrement by 

13 pre-increment by 

1,5 post -increment by 

13 pre-decrement by 

13 post -decrement by 

13 pre-tneremont by 

13 post-increment by 

13 pre-decrement by 

13 post-decrement by 

13 pre-increment by 

13 post-increment by 

13 pre-decrement by 

13 post-decrement by 
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Appondix III - The Intermediate Language: Keyword Macros 

The keyword macros of the intermediate language are described below in alphabetical order. Each 
section is headed by the name of a macro and its calling sequence; following is a description of the 
arguments and the intended function of the macro call. 

1. ADCONn: ZAn(NAME) [n=0 ,1,2,3] 

This is a set of macros, one for each possible pointer class. NAME is an object -language symbol 
constructed from an identifier by the IDN macro. The expansion of an ADCONn macro should define a 
pointer constant of pointer class "n" which points to the external variable or function with the given 
name. This macro is used in the initialization of static and external pointers and arrays of pointers. 

2. ALIGN: ZAL(N) 

N is an integer specifying the CTYPE (an internal type specification) of an object for which the 
appropriate alignment of the location counter must be made. The relevant CTYPEs are: 

value ctype 



2 


char 


3 


int 


4 


float 


5 


double 


6-9 


pointer 



The expansion of the macro call should be the pseudo-operations needed (if any) to properly align the 
location counter. This macro is used in the initialization of static and external variables. 

3. CALL: ZCA(NARGS,ARGP,0,FBASE,FOFFSET> 

The CALL macro generates a function call. NARGS is an integer specifying the number of arguments to 
the function call; ARGP is an integer specifying the byte offset in the caller's stack frame of the 
arguments which have been so placed by previous instructions. FBASE and FOFFSET are integers which 
together make up a REF specifying the location of the function being cellud (it may be indirect through a 
pointer in a register); these are passed as arguments 3 and 4 of the macro call so that they may be 
referenced as #F in the macro definition. 

4. CHAR: ZC(I> 

The CHAR macro produces a definition of a character constant whose value is the integer I; it is used in 
the initialization of static and external characters and arrays of characters. When producing code for an 
assembler which does not have a byte location counter (for example, the HIS-6000 assembler GMAP), the 
characters produced by CHAR macro calls must be stored in a buffer until either enough are accumulated 
to fill a machine word or a macro call other than CHAR is issued; in this case, all macros which may follow 
a CHAR macro must first check to see if there are any characters in the buffer and if so, print the 
appropriate statement and clear the buffer. 

5. DOUBLE: ZD(I) 

The DOUBLE macro produces a definition of a non-negative double-precision floating-point constant 
whose C source representation is stored in the internal compiler table CSTORE at an offset specified by 
the integer I (the compiler itself does not use any floating-point operations). This macro is used in the 
initialization of static and external double-precision floating-point variables and arrays. 
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6. END: IENDO 

The END macro marks tha and of tha intermediate language prog ram. It n»y produce an END statement, M 
needed, or signal that any processing associated with the end of the program should be performed. 

7. ENTRY: lEN(NAME) 

NAME is an object language symbol constructed from an identifier by the IDN macro. The expansion of 
the ENTRY macro should define the symbol as an entry point, that iirOne whk» is defined in the current 
program but accessible to other programs. 

8. EPILOG: XEl»<FUNCNO,FRAMBSIZB> 

The EPILOG macro produces the epilog code for a C function. The epHag code should restore the 
environment of the calling function and return to that function. In the HlS-fiOOO implementation, these 
ections are performed by a subroutine. FUNCNQ and FRAfcESU£ •*» i«tege*s phfch specify the internal 
function number of the function and the size in bytes of ifc stack frame, resp e ct ively. In the HIS-6000 
implementation, these integers are used to define an assamWy-ianguege symbol *hose value is the size in 
words of the stack frame; this symbol is used by the code produced by the PROLOG metro which eUocetes 
the stack frame. 

e. Eau: lEa(NAME) 

NAME is an object language symbol constructed from an identifier by the IDN macro; it is to be defined as 
having a value equal to the current value of tha location counter. 

10. EXTRN: XBX(NAME) 

The EXTRN macro is similar to tha ENTRY macro except that it defines the symbol to be en external 
reference, that is, one which is used in the current program but assumed to be defined in another 
program. 

11. FLOAT: IF a) 

The FLOAT macro produces a definition of a non-negative single-precision floating-point conetent; the 
argument has the same interpretation as that of the DOUBLE macro. 

12. GOTO: XGO(0, BASE, OFFSET) 

The GOTO macro produces an unconditional jump to a location denoted in the source program by a label 
constant or expression. BASE and OFFSET together make up a REF which specifies the target location of 
the jump; these are passed as arguments 1 and 2 of the macro csti so that they may be re fe r e nced ea eR 
in the macro definition. 

13. HEAD: IHDO 

The HEAD macro marks the beginning of the intermediate language program! It may produce hea d er 
statements, if needed, or signal that any initialization processing should be performed. 

14. IDN: II (X) 

The IDN macro should expand to the object language representation of the identifier whose C source 
representation is stored in the internal compiler table CSTORE at an offset specified by the integer X. 
The processing performed by this macro may include the truncation of long names, the replacement of the 
underline character (which is allowed in C identifiers), and tha insertion Of special characters) to avoid 
conflicts between C identifiers and Other object language symbols. 
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15. INT: UN (I) 

The INT macro produces a definition of an integer constant whoso Value is spocifiod by the integer J. It 

i!. u !^.T,!! , • lni * liB *• Ho,, °* i***^ •*■**■' vsrlsbtM sfid arrays and in 1twcomtrudk>n of tables for 
the iSWITCW meefo. . ; 

16. LABCON: ILC(N) 

The LABCON macro generates an address constant whose value is tr» address torrsspondrng to tnterneT 
label number N. The LABCON macro is used to construct the tables for the LSWITCH end TSWTTCH 
macros. 

17. LABDEP: 1L(N) 

The LABDEF macro defines the location of internal label number N. 

18. LN: ZLN<N> 

The LN macro associates the line in the source program whose line number is specified by the integer N 
with the current value of the location counter. This macro may optionally produce a comment line in the 
object program to aid in the reading of the object program, or it may defWellrm-number symbol to be 
used in conjunction with s debugging system. 

19. L8 WITCH: IL8<N,LBA8BJL0PF8«T4»A««40FP8BT) 

The LSWITCH macro should generate coda which jumps scwtfng to the value of the integer whose 
location is given by IBASE and IOFFSET (selected from the locations permitted by the OPLOC for the .«W 
operation*. This macro is immed Wely f otlowed by N <N»0) INT metres flhe case*), which ere immediately 
followed by N LABCON macros {the corresponding labels). A ssei^ shoukf be made through th« esse fist; 
if a match is found, a jump should bo madste the label downed by tha e orfeep on d i n g LABCON macro. If 
the integer matches none of thetiet eetrie* then a jump should be made iO the internaf label defined by 
LBASE and LOFFSET. ■.-;.,■.■/ 

20. NDOUBLE: IND(I) 

The NDOUBLE macro « the same as the DOUBLE macro except that the value of the defined constant is 
made negative. 

21. NPLOAT: INF (I) 

The NFLOAT macro is the same as the FLOAT macro except that the value of the defined conetent is mode 
negative. 

22. PROLOG: IP(PUNONO^'UNCNAMB) 

The PROLOG macro produces the prolog code f or a C function. FUNCNAME is an integer representing the 
name of the function as it sppears in the source program; its interpretation is the same as that of the 
argument of the ION macro. FUNCNO is an integer which specifies the Internal function number of the 
function; it may be used in conjunction with the EPILOG macro to access the size of the function'* stack 
frame. The PROLOG macro should define the entry point name and produce the code necessary to save 
the environment of the calling function and to set up the environment Of the celled function using the 
information provided in the function call. In the WS-6000 implementation, these actions are performed by 
a subroutine. The PROLOG macro call appears in the intermediate language program immediately before 
the first instruction of the corresponding function. 
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23. RETURN: IRTO 

The RETURN macro produce the statements r*eded to return from • function to the cafHflt fiNHMNwHl 
general, this macro will result in a transfer to tha EPILOG code. The returned value of the fuhetfem Hi 
loaded by preceding macro caNs into the appropriate register as specified in th* RETUHM(KG statement of 
the machine description. 

24. STATIC: ZST(N*S> 

The STATIC macro defines the location of the static variable whose internal static variable number is ri S 
is the size of the static variable in bytes. Typically, this macro wHt define an assembly language symbol 
by which the static variable can be referenced. 

25. STRCON: ZSC(N> 

The STRCON macro should generate a character pointer which points to the string constant whose 
internal string number is N. The STRCON macro is used in the InitiaNtetion of static end external 
variables. ^* 

26. STRING: ZiRO 

The STRING macro marKs the place in the object program where the string constants should be defined. 
This macro is implemented as a G routine macro since substantia* processing is involved. 

27. TSWITCH: lT8(LO t LBASE f LOFFSBT,IBAS« t IOPP#BT,HI) 

The TSWITCH macro produces m indexed jump based on the vatoe of the integer whose tosetlOn is given 
by IBASE and IOfTSET (selected from Hie locations pimlllM!tfrte&t&4mifo--j&*&10bri Thts 
macro is immediately foltowed by a sequence of HH&t-t Umm mum defining the target labels 
corresponding to integer values from LO to HL Vetoes outside ttiis range ehbetd mM1*.1itWilitiiMW#* 
internal label defined by L6&VSE and LOFFSET. 

28. ZERO: 12(1) 

The ZERO macro specifies the definition of # block of storage initiaHxed to reroj the stee in by tes of this 
storage area is specified by the integer L '-/' 



-51 



Appendix IV - The HIS-6000 Machine Description 



The machine description used in ths HIS-6000 implementation is listed below. Much of its complexity is a 
direct result of the feet that the HIS-6000 is not byte-addressed. In the macro definitions, the character 
sequence *\n* represents the newline character. 



typenames (char,int,floa^ 

regnames (xO,xl^(2,x3pt4^q.fH 

memnames <reg,»uto#xt^t«tj»anM»^abel^ntlit,fto«tlit^tringlit^x0^xl^x2^)(3^it4^a > iq)i 

size l(char),4(int,float)3(double>, 

align l(char),4(int,float)3(double)i 

class x(x0,xl,x2 l x3,x4), r(aflH 

conflict (a,f),(q»f)5 

saveareasize 16; 

pointer pO(l), pl(4h 

returnreg q(int,pO,pl),f (double); 

typechar(rWnt(r),float<f)^ouble(f) I pO(ry i pl(x)j 



.sw: 


a B l[x4J 


♦pO: -pO: +i;-i*^ .OR: -pOpO: «: »: 


fMU 


+pl: 


MMx; 


-pi: 


xaI; 


—H:«&: -A: -OR: . 


M/,1; 


*i: /i: 


^WiJ 


+d: -d: *d: /d: 


fMf; 


%: 


qMatqJ 


■«: ■»: 


"WP "Hf*(»*I 


&u: 


vu 




auto|ext|stat|stringlit|iajiq M ri 


.BNOT: .ic: xi: 


rHi 


-ui: — bi: 


M»r; 


xf : xd: .if: .id: 


•Ji 


.fc: .dc: .fi: .di: 


f»qs 


.fd: 


M*f; 


.df: -ud: 


Uj 


.ipO: .pOi: 


r„l; 




H/; 


.ipl: .pOpl: 


TnXJ 




** 


.pli: 4>lp0: 


X„T! 




M^rj 


++bi: 


KU 


++ai: ~ai: ++bc: ++ac: 




— be: —ac: 


M„«[ql 




M^a) 


++bp: ~bp: 


MMrN 


++ap: ~ap: 


WWql 




*•**•> 




MMxj 


«0: !-0: <0: >0: <-0: >-0: 


r|V; 


«p: !«p: <p: >p: <«p: >-p: 


rtxMH 
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,sw: " 


TSX5 


•SWTW 


xi: 


"\V 




.cc: 






(auto w ): " 


EA«R 


0,7\n" 


(«tat„): " 


EA«R 


.STAT\«" 


Oa„q): " 


STA 


.TtMP 




LDQ 


.TIMRV^ 


(iq„a): " 


STQ 


.T6MP 




LDA 


.TEMP\n" 


(auto|stat|indiract„): 




"%}f<*X»T), 


AD*R 


&6(»T)VS) 




TSX5 


.CTtwr 


(•xt|stringlit M ): 






m 


LO«R 


■F 




•RRL 


27" 


(r„r): " 


EAaR 


0,aFL 




•RRL 


18" 


(r„auto|stat|indirect|stringlit): 


m 


EAXS 


0,aFL\rt" 


(r„auto): " 


EA*f 


0.7\n" 


(r„stat): " 


EAaF 


.STAT\n' 


(rnautolstat): "Wf(«o<«'W, 


AO«F * 




TSX4 


.•rtocr 


(r^trinflit): " 


EA«F 


- «R 




TSX4 


.aFTOQ" 


<r„ext): " 


•FLS 


27 




ST»F 


«R 




•FRL 


2r 


<q»i«): 






"Xif<%0<» , R), 


ADA 


XecKa^RyVn.) 




TSX4 


.ATOC" 


<« w iq): 






-Xif(Xo(«m 


AOQ 


XccKftTDVt,) 




TSX4 


.QTdC" 


Ji: 
<r„M>: " 


ST»F 


•R" 


(*V): " 


LD*R 


•r 


<r„r>: " 


LLR 


36" 


.ff: 






<f„M): " 


FSTR 


•R" 


(M„f): " 


FLD 


•F" 


.dd: 






<f„M): " 


DFST 


*R" 


<M„f>: " 


DFLD 


•F* 


-pOpO: 






(•V>: - 


LLR 


36" 


(fnM): " 


STaF 


•R" 


(M^r):" 


LD*R 


•r 



teetilflVv) 
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.plpl: 






(x M x): " 


EA«R 


0,«3" 


<x„M>:" 


STZ 


•R 




ST«F 


•R" 


OyU):" 


LD*R 


»F" 


(x^q):- 


EAQ 


0,«3" 


<q*x)i" 


EA«R 


OflU" 


(M^q):" 


LDQ 


•r 


<q»M): " 


STQ 


•R" 


.pOpl: 






(r„x):" 


EA«R 


o,»Fir 


(M#h m 


LD«>R 


•r 


.plpO: 






<x„r): " 


EAoR 


0,«3" 


(M„r):" 


LD#R 


•p 


.ic: " 


AN«F 


-0377.DL" 


.ipO: 






(M„r): " 


LD«R 


•r 


(rjr): 


"\\" 




.ipl: 






<r*x): " 


EA«R 


0,«FU" 


.pOi: 






(M^r): " 


LO»R 


•r 


(r„r)s 


"W" 




•pli: 






<x„r): " 


EA*R 


0,«3" 


.fd: " 


FLD 


»n 


.df: 


"W 




xf: .cd: .if: .id: " 


LDQ 


0,01 




LDE 


«35B25,DU 




FNO" 




.fi: .di: " 


UFA 


-71B25AJ" 


.fc: .dc: " 


UFA 


-71B25,DU 




ANQ 


-0377,01" 


+h" 


AD»R 


»S" 


-i:" 


SB«R 


»S" 


*i:" 


MPY 


•S" 


/i: %i " 


DIV 


•S" 


♦d:" 


OFAO 


»S" 


-d:" 


DFSB 


•S" 


*d:" 


DFMP 


»S" 


/d:" 


DFDV 


«S" 


-+i: " 


AS»S 


•R" 
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»: 






Ontlit,): " 
(,-tntHt^: - 


•FRS 
LXL5 

•frs 


foOTT 

•S 

0J5* 


«: 






gntlit,): " 
(,-intlit,): " 


•FLS 
LXL5 
•FLS 


Xotw'Sr 

•S 

03" 


•»: " 


LD*R 
•RRS 
ST«R 


•F 
0,«SL 

■»#"■• 




LO*R 
•RLS 
ST«R 


«F 
0,«SL 

•F" 


♦pO:" 


•FRS 
AD«F 

•as 


16 

•s 

16" 


-pO:" 


•FRS 
SB*F 
•FLS 


16 
•S 
16" 


♦pi: " 


LXL«1 
AOL«*R 


•S 

•F" 


-pi:" 


QLS 
STQ 
SBL«F 


18 

.TEMP 

.TEMT 


-ui: " 


LC»R 


*r 


-bh" 


LD«R 
SB«R 
ST«R 


•F 

-iA 

•r 


-ud:" , 


FNEG" 




♦+bi:" 


AOS 


•r 


♦+»i: " 


LDoR 
AOS 


•F 
•F" 


— ai: " 


LOA 
LOQ 
SBQ 
STQ 
SBA 
STA 


•F 

•F\n" 

-1.DL 

•r 
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++bp: 






<*x):" 


LD»R 


»F 




EA«R 


1WS)/4»1 




ST«R 


•r 


U):" 


LD#R 


«F 




ADL«R 


Xco^) 




ST«R 


•r 


~bp: 






U):" 


LD»R 


•F 




EA»R 


-Xo^/iJ,*! 




ST»R 


•F" 


Uh" 


LO«R 


•F 




SBL«R 


SCOT'S) 




ST«R 


«r 


++ap: 






U>:" 


LD»R 


•F 




EAX5 


Sod^/Ml 




STX5 


«F" 


(»»toh " 


LDA 


«F 




LDQ 


•F\n" 


(«a):" 


ADLQ 


Xco<»'$) 




STQ 


•r 


(»q): " 


ADLA 


Jctf©^) 




STA 


•r 


— ap: 






<*x): " 


LD«R 


•F 




EAX5 


-Xo(»'S)/4,«l 




STX5 


•r 


<»«|q>: " 


LDA 


•F 




LDQ 


•F\n" 


<*•>: " 


SBLQ 


*co<»'$) 




STQ 


*r 


(,*>:" 


SBU 


fco^'S) 




STA 


•F" 


.BNOT: " 


ER»F 


—1" 


&u: 






(ia|iq„r): 






"Xlf<Xo(»'F), 


ADL»F 


Xco(»T)\rg\Y 


(i«„q): " 


LLR 


36" 


<iq„a): " 


LLR 


36" 


(auto|stat„r): " 


EA»R 


XnC«3,0) 


«if(%o<#T), 


ADL«R 


*co(i»T)\n,)\\ 


(ext|stringlit„r): " 


EA«R 


•r 


<»x):" 


EA»R 


«F" 


«:" 


AN»F 


«S" 


-&:" 


ANS»S 


»r 


A:" 


ER«F 


•S" 


-a:" 


ERS«S 


•r 


.OR:" 


OR»F 


»S" 


-OR:" 


ORS«S 


•r 



-56 



--p:" CMP*F «5 

TZE •R - 

!-p: " CMP«F *S 

TNZ «R" 

*PJ" CMP*F *5 

TZE *+2 

TNC mr 

>p:" CMP»F . *S- 

TZE **2 

TRC *R" 

<-p: " CMP«F «S 

TZE «R 

TNC »r" 

>-p:" CMP»F «S 

TRC «R" 

)c: 

W): " DFCMP -000\n" 

U):" CMP«R 0,DL\n" 

"x»ic(«o^i2r 

TpOpO:" S8UF *S 

«FRL 16" 

W:"6 QMAP* 

jmp: " TRA «0" 



•n: " SYMDEF «0 



i» 



•*:" SYMREF «0" 

st:" SYMREF iWLG^EPILG,TEMP^WTCH 

symref xrroMnooAToc^ec 

.STAT EQU *" 

p:"%ido(«l) EQU * 

TSXO .PROLG 

ZERO .FS«0" 

cos "-V20/ol,16/O" 

cr. " TSX1 «F 

ZERO »1/4,«K>" 

rt: - TRA .EPILG" 

«P=" TRA £PJLG 

.FS«0 EQU #1/4" 

gOi" TRA *R" 
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cpq: 






(auto*): " 


EAQ 


0,7\n" 


<stat„): " 


EAQ 


.STAT\n" 


(!•*>:" 


LLR 


36\n" 


(•utolstatlindirect,): 




"%jf<Xo<«T>, 


AOQ 


toXtTMV 


♦+bc: 






(auto|stat|indir»ct„): 




-%cpq(Oflfl,«T) 








STQ 


.TEMP 




LDA 


.TEMP 




TSX5 


.CTOA 




ADA 


UX. 




ANA 


-0377.DL 




EAX5 


0.AL 




T$X| 


.QTQC- 


(•xp 4 * 


USA 


•f 




ADA 


•oiooopu 




STA 


•r 


—be: 






(•utoM»t|indirectJ: 




"XcpqWAfcaT) 








STQ 


.TEMP 




LDA 


.TEMP 




TSX5 


.CTOA 




SBA 


1.DL 




ANA 


-0377.DL 




EAX5 


0,AL 




TSX4 


.QTOC" 


(•xt B H" 


LDA 


•F 




SBA 


-01000.DU 




STA 


•r 


++ac: 






(auto|stat |indirect„): 




"%cpq(0,0,0,»T) 








STQ 


.TEMP 




LDA 


.TEMP 




TSX5 


.CTOA 




EAX5 


1.AL 




TSX4 


.QTOC- 


(ext w ): " 


LDA 


•F 




LDQ 


•F 




ADQ 


-O1000.DU 




STQ 


•r 



m\>mmmMwuhmwMmiLMum4 



■& 









•TEMP 


(HA 


■ ■,*.?5PB^ 


Timit 

.*nw . 


JdQA 


■'•' ; i*il:. : -: 


■: .'-a&lfc 


■ ■ 'fWi • 


QflW 


feirUs- U3A 


«r 


LOQ 


*F . 



TO 



Oh* s«t of AMQft «Md m *• 



iwCU-j 



>,-/ ' p\ t "£- 


. ^£AJ 




■ ■ *w&s ; 


;**.. *,.! 






"' *■;*** 


:t»"*?it:f tt t5icly*) 


//W<t->*'* 


■ y3A- ' 




■"MPV 


■V* ^ ^ 




4 *0T 


AOJ 






.?X.?T 




'^fli, v.' ■ .' 


P*i-\X 'A ■ 




j0*\^,£0* ' 


AMA 






■^th' ■ 



■«T*. 



■\ 







- >-3»" 


'■■■'"'"•'■' 


'«! 




c --"-iT. 


W-2 ' 




■■' ifi$t^3T.. -• 


i-£\ { 






<'■••; " 






A 1 /"»**! 




" jG,t?^£0* 


:• ; /x 






. mn. 




"orw 


■" -'.^Xg'T . 




. . % 


' ,A*.,u ■ ■ 


* :*„:*») 


UG.000IO.- 


At?S 




."!« 


A?«* ■ 


; '■ :.SS* + 




'('■•' 




W5T, 


■ mz ' 




%U" 


■ ^ r ^ ■ 




At? v ~ 


ffSt'T 




■ jA.i. 


.Si^l .: 




*ooro ■ 


«&■; ' ■ 




' : ?fs ■ ■'■■. 




"■': ii»f»*/ 


■■ %T 


OG-J 


• 


UQjOb&IO* 


■ ooa 




'*n«." 


. p^- . 
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Appandlx V - Tho HIS-6000 C Routine Macro Definitions 

The C routine macro definitions used in the HIS-6000 implementation are listed on the following pages, A 
C routine meero definition is written as a C function returning a character string value. This cheractier 
string is "substituted" for the macro call and rescanned by the macro expander; thus, it may contain 
references to its arguments and embedded macro calls. THe 49M*+*miMSM^^to C r*to» *• ARQC 
and ARGV: ARQC is an integer specifying the number ei i&mn&*M*m0>#tfr»*t* present in the 
associated macro call; ARGV is an array of pointers to those arguments. 

When the following routines were written, the formatted print routine PRINT was capable of producing 
output only onto a file and not into a string in cores thus, where formatting is necessary, theee routines 
print their output directly and return the null string. Although there are dangers inherent i« ftls practice, 
in these cases the effect is the same as if the formatted string were returned and printed normeWy. The 
cherecter sequences '\t\ V, and '\V represent tab, newline, and backslash, respectively. 
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char *f n[] 

{-in-,-c-,-f",-nf,-d-,-|id","«Jr»jc", 

••dV2ViVsrV«ndVi»V»«l">"» 
"bth«r-,-if-}i 

char (*ff[]X) 

other,»if}j 
•nt nfn 18, 

lineno 0, 
mf lag 0, 
packb[43, 
packnoj 

char *aln(argcargv) int argc-.char «argvQ 

{lineno-atoi(argvtO]h 
packfO; 

r»turnC.N«0 EQU «")j 

char *aequ(argc,argv) int argcrcfear *argv[J 

{packfO; 

returnfaCEQU *"); 

} 

char *aint(argc,argv) int argcj char *argvfj 

{packfO} 

r»turn<"\tDEC\t*0")j 

} 

char *achar(argc,argv) int argc; char *argvfj 

{if (argc>0) packc(atoi<argvIO$); 

r»tuirn<"\\")i /» conc«al following newlirte */ 

char *afloat(argc,argv) int argcj char aargvtJ 

{packfO; 

if (argc>0) print("\tDEC\tXm"^tol(argvCO^); 
r»turn(""){ 

} 

char *adpuble(argc,argv) int argcj char «argv[J 

{ 

packfOj 
if (argc>0} 

{printC\tOEC\t")j 

r«turrHadbte<atoKargvtPJ)))5 

naturnC"); 
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} 

char *anegf(argc,argv) int argcj char *argv[J 

{pacKfOi 

if (argcX)) print{"\tDeC\t-Xm"^toi<argv[0>, 
returnC"); 

} 

char *anegcKargc,argv) int argci char *argv[) 

{ 

p«ckf<); 
if (argc>0) 

{print("\tDEC\t-"); 

returrKadWc<atoi(argy[OP))j 

raturnC"^ 
} 

char *astring(argc,argv) int argq char »argv[J 

{auto int i,f Jlc& 
auto char *cp; 

lc-0; /* location countar in STRING fila */ 
f-xop«n<pnama,fnjitring>*?EAD3INARY)j 

whilaU) 

{packfO; 

C"Cgatc(f); 

if(ceof(f» break; 

print(".SXd\tEQU\t*\n-Jc); 

lc++; 

whiie(l) 

{if (c— T) 

{c-cgetc(f); 

lc++s 

if <c— V) c-'\0's 

packc(c); 

} 
alsa 

{packctc); 

if (!c) braaks 

} 
c-cgetc(f); 
lc++{ 

} 
} 
cclose(f)j 

return("\\"); 
} 

char *aend{argc,argv) int argc; char **rgv[J 
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{packf<)j 

return("\tEND"); 

} 

char *regnames[} {"X0","Xr,"X2* t "X3 - ,"X4VA", , tr,""}j 

char *aname(argc,argv) int argcj char *argv[J 

{auto int basepffset; 

if (argc>l) offset-atoKargv[l]H else offset-0; 
if (argc>0) bese-atoKargvfOft ate* bese-O; 
if (mflag) cprintCANAMEtXd^An'^at^effsat); 
if (base>«0) returrKregnames[base3>, 
base - -base; 
if (base >- cjndirect) 

{print(-Xd,Xd" > off»et/4 l ba»e-eJndirect)j 
goto check; 

} 
else switch(base) { 

case c_auto: 

print(-Xd,7>ffset/4); 

goto check; 
case c_extdef: 

returnCXKel)"); 
case c_static: 

print(".STAT+Xd>ffset/4); 

goto check; 
case c_param: 

print("Xd,6>ffset/4); 

goto check; 
case cjabel: 

print(".LXd",offset)j 

break; 
case c Jnteger: 

if (offset<0 || bffset>32000) print("-*d",offset); 

else print("Xd,Dl',Qffset)j 

break; 
case c_float: 

printC-*s",adblc(offset)); 

break; 
case c_jstring: 

print(".S%d",off»eth 
break; 

} 
returnC"); 
check: 

if (of f set*4) error(6025,lineno)» 
returnC"); 
} 

AALIGN - align location counter 



>flgSf-». 
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char *aa!ign(argc,argv) int argej char *argv[J 
{ 

switch(atoi(argv[0])) { 
case ct_double: 

packf()j 

raturnCVEVEN"); 

} 
raturn("\\")5 

> ■ 

AJC - emit conditional jump 
*/ 

char *ajc(argc,argv) int argc; char *argv[} 
{auto int cond; 
cond-atoi(argv[0])j 

• . . ' ' ' ' 

switch(cond) { 

case cc_eqO: return<"\tTZE\t»i")i 

case cc_neO: return("\tTNZ\t»l"); 

caseccJtO: return("\tTMI\tel"); 

case cc_geO: return("\tTPL\tar>j 

casecc_gtO: return("\tTZE\t**2\n\tTR.\tel"H 

caseccJeO: returnC\tTZE\tel\n\tTMI\tal ,, )j 

returnC"); 
} 

char *other(argc,argv) int argc; char »«rgv[J 

{switch<atoi(argv[0]» { 
case 5: returnCQ")? 
case 6: returnCA"); 

} 
returrK"BAD")s 

} 

char *aif(argc,argv) int argc; char *argv[> 

{return(atOi(argv[0])?"ar:"a2")j 
} 

/* PACK CHARACTERS INTO WORDS */ 

packc(i) int i; 



<*m. 
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{ 

packb[packno++]»i; 
if (packno>~4) 

{printC-VVFOVtS/IW^/IW^/W^/MVi*, 

packb(01|»ekMllpack^]*ackb[3])t 
packno-Gj 

} 
} 

packfO 
{ 

whiWrtpacknoS-O) pacMOh 
} 



char *aadcon(argc,argv) int argcj char «argv[J 

{packfO; 

r»turnntZERO\t«0")$ 

} 

char *azsro(argc^rgv) int argct char *argv[J 

{auto int i,j; 

if (argc>0) 

{i-atot(argv[0])i 

whiMpackno && i) {packc(0)si— ,} 

j - i/4j i -1 4j 

if (j>0) prin«-\tBSS\«d\n"j)s 

wh»Wi~ H»acke{0V 

} 
rpturnCW); 

} 

char *aidn(argc,argv) int argcj char *argv(J 

{auto char *cpl,*cp2j 
static char n[7J 
auto int i*c; 

if (argc>0) 

{cpl - &cstore[atoi<argv[0])j 

cp2 - nj 

for(i«0ji<6>i++) 

{c • «cpl++t 

if (c — V) c - \fj 

*cp2++-c$ 

} 
*cp2-'\0'5 

raturrKnh 

} 
raturnC"); 
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} 

adblc(i) 

{auto char *cpl,*cp2; 
static char buf[30]; 
auto int c.flag; 

flag-FALSE; 
cpl - &cstore[i]; 
cp2 - &buf[0]j 

while(c - *cpl++) 

{if (c — 'E') 

{flag-TRUE; 
c - 'D'; 

} 
if (cp2 < &buf[27]) 

*cp2++ - c; 

} 
if (!flag) 

{*cp2++ - 'D'; 
*cp2++ - '0'; 

} 
*cp2++ - '\0'; 

return(&buf[0]); 

} 
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Appendix VI - Overall Description of the Compiler 

The compiler consists of four major phases. First, the lexical analysis phase (CI) transforms the source 
program into a string of lexical tokens such as identifiers, constants, and operators. Second, the syntactic 
analysis phase (C2) parses the token string and produces a tree representation Of each function 
(procedure) defined in the source program. Third, the code generation phase (C3) transforms the trees 
produced by the syntactic analysis phase into an intermediate language program consisting of a sequence 
of macro calls representing instructions of the particular abstract machine defined by the imptementer. 
Finally, the macro expansion phase (C4) expands the macro calls, producing an object language program 
as the output of the compiler. In addition, there is an error message editor (C5) which is invoked fast in 
order to format any error messages produced by the other phases. The phases of the compiler are 
invoked in sequence by the control program (CC). The control program communicates with the various 
phases by passing as arguments to an invoked phase a set of character strings representing file names 
and an option list; the invoked phase returns a completion code which indicates Whether or not any 
serious or fatal errors occurred during the execution of that phase. The various phases communicate 
with each other using intermediete files. 

The lexical and syntax analysis phases may be run sequentially as described above, or, where a system's 
program size restrictions permit, may be combined into a single phase, thus eliminating the use of en 
intermediate file. This option is implemented through the use of compile-time conditionals. The remainder 
of this chapter will assume trurt the two phases are separate. 

1. The Lexical Analysis Phase 

The lexical analyzer reads in the source program and breaks it into a string of tokens such as identifiers, 
constants, and operators. The lexical analyzer also interprets compile-time control lines which allow one 
to include source from other files and to define manifest constants. The lexical analyzer produces output 
onto three intermediate files: the TOKEN file, which contains the string of tokens, the CSTORE file, which 
contains the source representations of identifiers and floating-point constants, and the STRING file, which 
contains character string constants. The TOKEN file is passed to the syntax analysis phase; the CSTORE 
and STRING files are not used until macro expansion. In addition, the lexical analyzer may write error 
messages in an internal form onto the ERROR file. A token is represented by a pair of integers called the 
TYPE and the INDEX of the token. The syntax analyzer performs its analysis on the basis of the token 
TYPE; thus most operators have a distinct TYPE, and there are separate TYPEs for identifiers, integer 
constants, floating-point constants, and character string constants. The INDEX is used to distinguish 
particular identifiers or constants; for example, the INDEX of an identifier is the index of the source 
representation of the identifier in the array of characters written onto the CSTORE file. 

The main routine of the lexical analyzer consists of a loop which calls a routine GETTOK to return the 
next token in the input stream and then writes the token onto the TOKEN file. This loop also contains 
code to interpret compile-time control lines. GETTOK obtains input characters from a routine LEXGET 
which contains the logic to switch the input between the primary source file and "included" files. Except 
when processing character string constants, GETTOK translates the input characters using a translation 
table. On GCOS, this translation maps lower case into upper case, tabs into blanks, and carriage returns 
into newlines. This t able w ould be changed when moving the compiler to a system using Other than the 
ASCII character set. GETTOK partitions the character set into the following character classes: 
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1. letters 

2. digits 

3. apostrophe (') 

4. quotation mark (") 

5. newline 

6. blank 

7. period (.) 

8. the escape character (\) 

9. invalid characters 

10. characters which are unambiguously single- 
character operators (such as '{') 

11. characters which may begin a multi-character 
operator (such as *<* which may begin '<-') 

GETTOK uses the character class of the current input character to determine its actions in analyzing the 
input string. 

2. The Syntax Analysis Phase 

The syntax analyzer accepts as input the token string generated by the lexical analyzer and produces 
output onto three intermediate files for the code generation phase: a tree representation of each function 
defined in the source program is written onto the NODE file; a symbol table containing declarative 
information about identifiers is written onto the SYMTAB file; and information regarding specified initial 
values of variables is written onto the INIT file. 

The main routine of the syntax analysis phase is a table-driven LALR(l)parser. The tables are generated 
by a parser-generator YACC, written by S. C. Johnson [18]. The input to YACC is a BNF-like description 
of the syntax of C, augmented by action routines which are to be invoked by the parser when particular 
reductions are made. YACC analyzes the grammar and produces a set of tables written in C which are 
then compiled into the syntax analysis phase. 

The tables produced by YACC represent instructions to the parser to test the TYPE of the current input 
token, to shift the current input token onto the stack, to perform a reduction and call an action routine, or 
to report a syntax error. When a syntax error is discovered, the parser writes error messages onto the 
ERROR file which give the current state of the parse. It then attempts to recover from the error so that 
any additional syntax errors in the program can meaningfully be reported. The parser attempts a 
recovery by popping states from the stack and/or skipping input tokens in various combinations. A 
recovery attempt is considered successful if the next five input tokens are shifted without detecting a 
new syntax error. If a recovery attempt is successful, error messages are written which describe the 
recovery actions taken and parsing is continued. If a successful recovery cannot be made within a limited 
region of the input program, the parser ceases execution after writing an error message. 

The following C program illustrates the compiler's response to a syntax error, in this case unmatched 
parentheses: 

int c; 

int f(file) 

{if ((c-getc(file) !- 0) return(-l); 

return(0>, 

} 

The first error message, listed below, gives the state of the parse when the syntax error was discovered, 
followed by a cursor symbol '_', followed by the next five input tokens. The next error message indicates 
that the parser was able to recover from the error by skipping the next two input tokens. The resulting 
program, although syntactically correct, is meaningless. Therefore, in order to avoid extraneous error 
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messages, the code generation phase and the macro expansion phase are not executed after syntax 
errors have been detected. 

3: SYNTAX ERROR. PARSE SO FAR: <*xtjdefjist><f unction _dcl> 
<block_head> IF ( <e> _ RETURN (- 1 ) 
3: SKIPPEO: RETURN ( 

The following program also contains a syntax error due to unmatched parentheses? however, since there 
ere no more right parentheses in the statement following the point where the error is detected, the 
parser recovers from the terror by deleting 'the unfinished IF clause. 

int c; 

int f(file) 

{if ((c^getc(file)--0)c--l } 

return(c); 

} 

3: SYNTAX ERROR. PARSE SO FAR: <ext_defjist> <functtonjdcl> 
<block_head> IF ( <e> .C - - 1 ; 
3: DELETED: IF ( <e> 

The following program is en exempt* of « syntax error from which the perser couid not recover within its 
allowed limits; thus, after skipping input tokens up to this limit, the parser gives up. 

int cj 

int Hi He) . 

{if «c«getc(file) !- 0) c - 1$ 

else c »0». - 

return(c); 

} 

3: SYNTAX ERROR. PARSE SO FAR: <extjfcfjist> <functkwufcl> 
<block _heed> IF ( <»> . C - 1 s &SE 
3: SKIPPED: C-l; 
4: I GIVE UP 

3. The Code Generation Phase 

The code generation phase performs the following operations: (1) allocates storage for (determines the 
run-time locations of) variables, (2) performs type checks on operands and inserts conversion operators 
where necessary, (3) translates the tree representation of expressions into a more descriptive form with 
AMOPs, (4) performs some macNr«-irKtepenoent optimizations on expres$ions, (5) emits macro celts to 
define names which may be referenced by other programs (ENfRY symbols) and to declere names which 
are assumed to be defined in other programs (EXTRN symbols), (6) emits macro calls to define end 
initialize variables, (7) emits macro caHs to execute the control statements of eech function defined in the 
source program, and (8) emits macro calls to evaluate expressions. 

The code generation phase reads the NODE, SYMTAB, and INIT files produced by the syntex enelysis 
phase and writes an intermediate language program in the form of macro calls onto two intermediate files, 
the MAC file and the HMAC file. The HMAC file contains the macro calls defining ENTRY symbols end 
EXTRN symbols which ere produced lest by the code generation phase but which, in scmif systems, may 
be required to appear at the beginning of the assembly language program. The MAC file contains the 
remainder of the intermediate language program. 

The main routine of the code generation phase consists of a call to a routine SAILOC, which Allocates run- 
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time storage and emits macro calls to define and initialize variables, followed by a loop which reads in the 
tree representation of a single C function from the NODE file and generates code (macro calls) for that 
function, followed by a call to a routine SDEF which emits macro calls to define ENTRY and EXTRN 
symbols. 

The generation of code for a C function begins with a call to a routine FHEAD with the name of the 
function as an argument. FHEAD emits a PROLOG macro call which defines the entry point and produces 
code to set up the proper run-time environment. FHEAD then allocates storage in the run-time stack 
frame for the automatic variables of the function; storage is allocated for automatic variables in order of 
decreasing alignment requirement so that no space is wasted in the stack frame. The stack frame is 
assumed to be aligned according to the strictest of the alignment requirements of the various C data 
types (usually that of double-precision floating-point). A save area of the size specified in the machine 
description is reserved at the beginning of the stack frame. 

The call to FHEAD is followed by a call to the routine STMT to generate code for the compound statement 
which is the body of the C function. The generation of code for the body of a C function occurs on two 
levels, the statement level and the expression level. The generation of code for statements is handled by 
the routine STMT which takes one argument, a pointer to a subtree representing a C statement. STMT is 
actually a very short routine which makes recursive calls to itself for the branches of a STATEMENT_LIST 
node and calls a larger routine ASTMT if the specified node is an actual statement (as opposed to a 
statement list). The purpose of splitting code generation for statements into the two routines STMT and 
ASTMT is to minimize the amount of stack space used while recursively descending the statement tree. 

Following the call to STMT to generate code for the body of the C function, the size of the stack frame is 
adjusted to be a multiple of the stack alignment and an EPILOG macro call is emitted. On the HIS-6000, 
the EPILOG macro defines an assembly-language symbol whose value is the stack frame size; this symbol 
is referred to by the code produced by the PROLOG macro which allocates the stack frame. 

4. The Macro Expansion Phase 

The macro expansion phase expands the macro calls on the HMAC and MAC intermediate files using the 
information on the CSTORE and STRING intermediate files and places the result of that expansion on the 
output file. The macro expander is not a general -purpose macro processor; in particular, there are no 
built-in macro calls for defining macros or for handling local or global variables. Furthermore, the total 
number of characters (after any macro expansion) in the argument list of a macro call is limited to 100. 
The maximum allowed depth of nested macro calls is 10. 

The macro expander processes a stream of characters terminated by a NULL character. Within this 
stream of characters, the characters TT, V, and '\' have special significance. The TT character indicates 
the beginning of a macro call, which consists of the T, followed by the name of the macro, followed by a 
(possibly null) list of character string arguments separated by commas and enclosed in parentheses. The 
'** character is used within the body of a macro definition to refer to the arguments of the macro call; the 
character sequences '«0' through '*9' refer to arguments through 9, respectively. The *V character is 
an escape character. The special interpretation of a character such as TT, V, *)' or V is inhibited when 
that character is preceded by a '\\ In addition, the character sequences '\t\ '\n*, '\r* are used to 
represent tab, newline, and carriage-return, respectively. A '\* character followed by a newline character 
results in both characters being ignored; thus a macro which expands to a backslash will swallow the 
newline which followed the macro call in the input file. (A macro call in the input file which expands to 
the null string will leave a blank line in the compiler output; this is generally a sign that the implementer 
has not completely specified the macro definition for an AMOP.) The backslash character itself is 
represented by *\\'. 

The normal operation of the macro expander consists of copying characters directly from the input stream 
to the output stream. When a T is encountered, the name of the macro and the arguments of the macro 
call are evaluated and collected in a buffer; this evaluation may itself involve the processing of embedded 
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macro calls. The input stream is then switched to the body of the macro definition and normal processing 
is resumed. When a V is encountered, the argument number is read and the input stream is switched to 
the corresponding character string argument of the current macro call, which is stored in the associated 
buffer. Normal processing is then resumed. The input stream operates in a stack-like manner in that 
when the end of a macro definition or an argument string is reached, the input stream is restored to its 
previous state. When end of file is reached on the rl file, the input stream i< switched to the MAC 
file; when end of file is reached on the MAC file, macro expansion is terminated 

There are three types of macros which are handled by the macro expander. First, there are the macros 
representing three-address abstract machine instructions, which ere produced by the code generator 
while processing expressions. These macros are defined only in the machine description; the macro calls 
are of a special form which directly specifies the internal number of the corresponding macro definition, 
as assigned by GT. For example, the macro call *3 refers to macro definition number & Second, there 
are the keyword macros which are produced by the code generator white processing function definitions 
and statements. These macros may be defined either jn the machine description or by C routines; the 
macro calls specify the macro names as given in Appendix III. Finally, there are the macros which are 
created hy the implementer and used within other macro definitions. These macros may be defined either 
in the machine description or by C routines; the macro cells specify the macro name as defined by the 
implementer. 

A macro which is defined in the machine description is specified as jt list of one or more character string 
constants, possibly with associated location prefixes for conditional expansion. Such a macro definition is 
implemented as a list of pointers to the character string constants, along with associated integers 
representing the conditions specified in the location prefixes, if any. The lists are accessed through en 
array MACDEF, produced by GT, which is indexed by the internal macro da numbers assigned by 

GT to each macro definition in the machine description. As mentioned above,* macro ceU representing a 
three-address abstract machine instruction directly specifies the macro definition number. Other macros 
defined in the machine description are represented in a table produced by GT which associates the macro 
names with the corresponding macro definition numbers. 

Macros defined by C routines are represented in a table provided by the implementer which associates 
the macro names with the corresponding C functions. This table consists of an array FN of pointers to 
the character string macro names, an array FF of pointers to the corresponding C functions, and an 
integer NFN specifying the number of entries in the table. It would be more convenient for the 
implementer to specify the C macro definitions in the machine description and let GT construct NFN, FN, 
and FF; however, this was not done because of the lexical dtfficultiej with including C source in 

the machine description. 

The macro expander is implemented as two levels of get-character routines. The lower lovel routine, 
GETC1, returns the next character from the current input source which may be either the input file 
(HMAC or MAC intermediate file) or a character string in memory. If it is a character string, it may be 
part of a definition of a macro Specified in tha machine description, an argument of the current macro call, 
or the result returned by a C routine macro definition. The current state of the input stream is kept in a 
stack of structures called input control blocks (ICBs); GETC1 uses the top ICB on the Mack to determine 
the source of the next cheracter. The members of an ICB structure are Med below with their meanings: 
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F a flag indicating the type of the current input source (the input file, a macro 

defined in the machine description, or a character string) 

LOCP if the current input source is a macro defined in the machine description, this is a 

pointer to the current position in the list containing the pointers to the character 
strings which make up the macro definition 

CP if the current input source is not the input file, this is a pointer to the next 

character in the current character string 

ARGV[10] an array of pointers to the character string arguments of the current macro call 

BASE[3] the REF.BASEs of the result, the first operand, and the second operand of the 
current macro call, used when computing conditional expansion 

A NULL character indicates the end of a character string or end-of-file on an input file; thus if the current 
input character is NULL, GETC1 updates the current state of the input stream by advancing LOCP or by 
popping an ICB off the stack or by switching the input file from the HMAC to the MAC intermediate file. 
GETC1 returns the NULL character only upon end-of-file on the MAC intermediate file. 

The higher level get-character routine is MGET, which implements the V, X and '\' conventions. MGET 
begins by calling GETC1 to obtain a character. If the character returned is a backslash, then GETC1 is 
called again to obtain the second character of the escape sequence and the appropriate action is taken: 
If the escape sequence is '\t\ '\n\ or V, then the character is taken to be tab, newline, or carriage 
return, respectively. If the second character is a newline, then it is ignored, and MGET returns the result 
of a recursive call to itself. Otherwise, the second character is returned as the value of MGET (thus it is 
protected from special interpretation). 

If the resulting character is not a V or a T, then MGET returns that character directly. A V followed 
by a digit results in pushing a new ICB onto the stack pointing to the appropriate character string 
argument of the current macro call. A V followed by '0', T\ 'S', or 'R' (see Appendix I, section 3) results 
in a call to the C routine ANAME (which implements the NAME macro) with the appropriate arguments. 
When a TC' is encountered, the macro name is collected and the arguments are assembled into a 100- 
character buffer. The macro name and the arguments are obtained by recursive calls to MGET so that 
embedded macro calls are expanded; the result of expanding an embedded macro call may include commas 
or right parentheses without interfering with the argument structure of the macro call being processed. 
If the macro name is an integer, the correspondingly numbered macro definition from the machine 
description is used; otherwise, the macro name is looked up in a hash table containing the names of all 
defined macro names. If the macro is defined in the machine description, a new ICB is pushed onto the 
stack with LOCP pointing to the beginning of the list of pointers to character strings which represents the 
macro definition. Otherwise, if the macro is defined by a C routine, the C function is called and an ICB is 
pushed onto the stack which points to the character string returned by that function; thus references to 
arguments and embedded macro calls in the string returned by the C function are processed. MGET then 
resumes normal operation by calling GETC1. Note that the effect of a call to an undefined macro is to 
replace the macro call by the null string; no error messages are produced by the macro expander. 

The main routine of the macro expander consists of initialization, including the setting up of the hash 
table, followed by a loop which calls MGET repeatedly and writes the returned character onto the output 
file; this loop terminates when the returned character is NULL. 

5. The Error Message Editor 

The error message editor is invoked as the last phase of the compiler to read from the ERROR 
intermediate file the error records written by the previous phases and to print error messages 
corresponding to those error records. The error message editor allows variable data, such as identifier 
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names, to be included in the printed messages. In addition, error messages of erbifrary kngth «n be 
constructed from a sequence of error records! the error message editor automatically breaks long output 
lines so that all output lines fit within a fixed page width. 

An error record is a structure containing seven integers: an error number, a line number, and five 
arguments. The error number selects a basic error message string which contains the fixed text of the 
error message and optional indicators for including variable data. An indicator is a two-character 
sequence beginning with a TT; the character following the * defines the interpretation of the variable 
data which will replace the indicator when the string is printed The variable data is specified by one or 
more of the arguments in the error record. The arguments arc associated i*j$»! the indicators from left to 
right; arguments are used as needed according to the interpretations specified by the Jwfcetors. The 
various indicators are listed below with their interpretations: 

Xd print the next argument as a decimal integer 

Xm print the string in the internal compiler table CSTGRE which begins at the index 
specified by the next argument 

Xn print a string representing a node (operator) of the internal representation produced by 
the syntax analysis phase, as specified by the next argument 

Xq print a string representing the terminal or nonterminal symbol associated with the 
parser state specified by the next argument 

Xt print the source representation of the token whose TYPE and INDEX ere specified by 
the next two arguments 

XX print a TT 

Only the arguments which are referenced by the basic error message string ere specified when en »rror 
record is written; the values of the remaining arguments in the record ere undefined. 

The line number field in the error record associates a line in the source program with the error which 
produced a particular error record. If a line number is given (LINENO > 0^# is printed out on e new tine, 
followed by a colon, followed py the text specified by the error record; other wise (LINENQ *mQ), the text 
specified by the error record is printed on the current line. Thus en error, message consists of an initial 
error record containing a line number followed by zero or more error records without line numbers. In 
his manner, an error message of arbitrary length can be constructed. For example, the message giving 

ft IT*. ,. ?* °! y pars * When " synt ** *" w ^ tman <«icoveredisee eeetipn $' .traded 

from the following basic error message strings: 

"SYNTAX ERROR. PARSE SO FAR: " 

1*3" ^ for •• crt **«*• on ihe parser stack) 

" _" (represents the input cursor) 

" Xt" (for each of the next 5 input tokens) 

The syntax analysis phase can produce these error messages without counting the symbols in the 
messege or knowing their lengths because the error message editor takes care of breaking long output 

In eddition to selecting a basic error message string, an error number represents the severity level of 
the corresponding error; 
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error number severity 

1000- 1999 error 

2000-3999 serious error 

4000-5999 fetaj error 

6000-6999 coiBpfiiwr frror 

A fatal error or a compiler error will terminate the current phase, and nip remaining phase (except the 
error message editor) will be invoked} in addition, a compiler error message is automatically preceded by 
the string 

1C0MPILER ERRQR" 

A serious error allows the current phase to continua execution, butiU remaining phases (except the error 
message editor) are skipped 

The error message editor writes its output onto tha standard output. unit which is normally the user's 
terminal in a time-sharing system or a Una printer in a batch system. However, when the compiler ts 
submitted as a batch job by a time-sharing user, this Output is redirected onto an error listing file. This 
is accomplished by passing tha argument "»«" to the error message editor wNch indicates that output 
to the standard output unit is to be appended onto fitecode EL (the error Hating file). Redirection of 
standard input and output is a (not necessarily portable) feature of the C run-time system, rather than of 
the compiler itself. 

6. Invoking the Compiler Phases 

The mechanisms for invoking a phase of tha compiler, passing arguments to it, end returning a completion 
code are operating system dependent. In general, tha control program WW be rewritten for each system 
on which the compiler runsj on soma systems, tha control program may be replaced by a set of job 
control cards (see Figure 1 on page 31). Tha source of the compiler phases need not be changed, 
however; the operating system dependencies associated with the invocation of a C program are isolated 
in two run-time routines, the startup routine and tha exit routine. The ttertup routine receives control 
from the operating system, establishes the C run-time environment, and calls the C routine named MAIN. 
It is the responsibility of tha startup routine to take the character string arguments, which may be 
provided by the operating system or written on a temporary file, and arrange them as an array of 
character strings which is then passed as an argument to MAIN. The exit routine EXIT is called upon a 
return from MAIN; it may also be called directly by a C program. The exit routine closes all open files 
and returns control to the operating system. EXIT has one optional argument, a return code, which it 
communicates to the control program as a completion or abort code or by writing it onto a temporary file. 

On UNIX, a phase of the compiler is invoked by calling tha system routine FORK, which creates a new 
process, followed by a call in the new process to the system routine EXE3CL, which overwrites the process 
with the desired phase of the compiler .and passes it a list of character strings as arguments. The old 
process waits for the execution of tha compiler phase to finish by calling the system routine WAIT, which 
waits for the process to die and returns its completion code. 

On GC0S, two methods are used to invoke a phase of the compiler from the control program, which runs 
in time-sharing. The first method uses a routine SYSTEM, a C-callaWe Interface to the system call CALLSS 
which can invoke any time-sharing subsystem (programX The character string arguments are passed in 
the system teletype buffer (using the system call PSEUD0) so that to the invoked program it appears that 
it was invoked by a command typed at command level^ with those arguments. The completion code is 
stored (using the system call C0RFIL) in the first word of the core file, a ten word buffer provided by the 
operating system for communication between a user's subsystems. The disadvantage of running the 
compiler phases in time-sharing is that tha compiler phases, being large programs, can take a very large 
elapsed time to run. Thus this method is used only for the error message editor wNch prints error 
messages on the user's terminal. 
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The second method uses • routine TASK, a C-callable interface to the TASK system call, to submit a 
program as a special, high-priority hatch activity. The elapsed time for a TASK activity is. typically much 
lower than for the same program run in time-sharing. The character string arguments are written onto a 
temporary file which is reed by the startup routine when in batch. The completion cod* Ic handled as 
follows: if there is no argument to EXIT or the argument is 0, EXIT terminate* netiy ted TASK will 
■seturn a status code of 0. Otherwise, EXIT aborts with the completion J abort ttMi tha abort 

code is then returned in the status code by TASK. 
'■■■:■'■■■( 

The compiler phases can also be invoked as normal GCOS batch activities by the sequence of control 
cards shown in Figure 1. When these cards are submitted, I0EHT and USERIO cards are inserted at the 
beginning of the deck and the characters V and Y are replaced by the user's identification and the basic 
component of the source file name, respectively. Thus if the user is f end the source file is *8/TEST.C*, 
the •"•^ly-language output wW be written onto the file '3/TfSW end <he error meeeages will be 
written onto the file t/TEST.E*. The generation of the control cards ehtf the submission Of the belch job 
is performed by a time-sharing program (commendX As the turn-around time for e normal batch job cen 
be quite long, thta version of the compiler it used only for those programs which art too targe to compile 
using the other version of the compiler. "*"^^ 



